Bridges and advanced schema features¶
- Bridge tables — many-to-many relationships between two dim tables, with optional cardinality bounds.
pool.{attr}columns — surface a segment-attribute as a column on the entity dim.- Sub-entity dims —
dim.count = Nproduces N rows per parent entity (employees per company, users per account).
1. Bridge tables — many-to-many between dims¶
A bridge is its own table — one row per (left_entity, right_entity) pair. cardinality: (min, max) bounds how many right entities each left entity associates with. The engine samples uniformly within the bounds; pass driver: <metric> to bias selection by trajectory position.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from plotsim import create, generate_tables
cfg_bridge = create(
about="Bridge demo",
unit="company",
window=("2024-01", "2024-12", "monthly"),
metrics=[
{"name": "revenue", "type": "amount", "polarity": "positive",
"range": [1000, 100000]},
],
segments=[
{"name": "growth_co", "count": 10, "archetype": "growth"},
{"name": "steady_co", "count": 10, "archetype": "flat"},
],
dimensions=[
{"name": "dim_date", "per": "period", "columns": [
{"name": "date_key", "type": "id"},
{"name": "date", "type": "date"},
{"name": "year", "type": "int"},
{"name": "month", "type": "int"},
]},
{"name": "dim_company", "per": "unit", "columns": [
{"name": "company_id", "type": "id"},
{"name": "company_name", "type": "faker.company"},
]},
# Sub-entity dim — 4 users per company. Used here as the bridge's
# right-side dim. Both sides of a bridge must be declared dims with
# enough rows to satisfy `cardinality.max`.
{"name": "dim_user", "per": "unit", "count": 4, "columns": [
{"name": "user_id", "type": "id"},
{"name": "company_id", "type": "ref.dim_company"},
{"name": "user_name", "type": "faker.name"},
]},
],
bridges=[
# Each company associates with 1..3 user records (M:N — beyond the
# primary `company_id` FK on dim_user, this models extra access /
# cross-company assignments).
{"name": "bridge_company_user",
"left": "dim_company", "right": "dim_user",
"cardinality": (1, 3),
"driver": "revenue",
"columns": [
{"name": "weight", "type": "metric.revenue"},
]},
],
)
tables = generate_tables(cfg_bridge, np.random.default_rng(cfg_bridge.seed))
print(f"Tables: {sorted(tables)}")
print(f"\nbridge_company_user: {len(tables['bridge_company_user'])} rows")
tables["bridge_company_user"].head()
# Cardinality distribution — how many users per company?
counts = (
tables["bridge_company_user"]
.groupby("company_id")
.size()
.value_counts()
.sort_index()
)
counts.plot(kind="bar", figsize=(7, 3),
title="Cardinality — users per company in the bridge (configured 1..3)")
plt.xlabel("# users"); plt.ylabel("# companies")
plt.tight_layout(); plt.show()
2. pool.{attr} — segment attributes as dim columns¶
Every segment can carry arbitrary attributes (single values or lists). When a dim column declares type: "pool.<attr_name>", the engine populates that column from the entity's segment attribute. Lists get sampled per entity; scalars repeat.
cfg_pool = create(
about="pool.{attr} demo",
unit="customer",
window=("2024-01", "2024-12", "monthly"),
metrics=[
{"name": "spend", "type": "amount", "polarity": "positive",
"range": [10, 500]},
],
segments=[
{"name": "enterprise", "count": 15, "archetype": "growth",
"attributes": {
"tier": "enterprise",
"region": ["US", "EMEA", "APAC"],
"plan": ["pro", "enterprise"],
}},
{"name": "starter", "count": 15, "archetype": "flat",
"attributes": {
"tier": "starter",
"region": ["US", "EMEA"],
"plan": "free",
}},
],
dimensions=[
{"name": "dim_date", "per": "period", "columns": [
{"name": "date_key", "type": "id"},
{"name": "date", "type": "date"},
{"name": "year", "type": "int"},
{"name": "month", "type": "int"},
]},
{"name": "dim_customer", "per": "unit", "columns": [
{"name": "customer_id", "type": "id"},
{"name": "customer_name", "type": "faker.name"},
{"name": "tier", "type": "pool.tier"},
{"name": "region", "type": "pool.region"},
{"name": "plan", "type": "pool.plan"},
{"name": "cohort_size", "type": "segment.count"},
]},
],
)
tables = generate_tables(cfg_pool, np.random.default_rng(cfg_pool.seed))
dim = tables["dim_customer"]
print(f"dim_customer columns: {list(dim.columns)}")
print(f"\nRegion distribution per tier:")
dim.groupby("tier")["region"].value_counts()
3. Sub-entity dims — count: N for hierarchies¶
Set count: N on a per: unit dim and the engine emits N rows per parent entity. This is how you model 1:N hierarchies — employees within companies, users within accounts, devices within households — without writing a separate generation step.
cfg_sub = create(
about="Sub-entity dim demo",
unit="company",
window=("2024-01", "2024-12", "monthly"),
metrics=[
{"name": "revenue", "type": "amount", "polarity": "positive",
"range": [1000, 100000]},
],
segments=[
{"name": "smb", "count": 5, "archetype": "growth",
"attributes": {"tier": "smb"}},
{"name": "midmarket","count": 5, "archetype": "growth",
"attributes": {"tier": "midmarket"}},
],
dimensions=[
{"name": "dim_date", "per": "period", "columns": [
{"name": "date_key", "type": "id"},
{"name": "date", "type": "date"},
{"name": "year", "type": "int"},
{"name": "month", "type": "int"},
]},
{"name": "dim_company", "per": "unit", "columns": [
{"name": "company_id", "type": "id"},
{"name": "company_name", "type": "faker.company"},
{"name": "tier", "type": "pool.tier"},
]},
{"name": "dim_user", "per": "unit", "count": 5, "columns": [
{"name": "user_id", "type": "id"},
{"name": "company_id", "type": "ref.dim_company"},
{"name": "user_name", "type": "faker.name"},
{"name": "role", "type": "static.member"},
]},
],
)
tables = generate_tables(cfg_sub, np.random.default_rng(cfg_sub.seed))
print(f"dim_company: {len(tables['dim_company'])} rows")
print(f"dim_user: {len(tables['dim_user'])} rows "
f"({len(tables['dim_user']) // len(tables['dim_company'])} per company)")
# Show the parent-child relationship
joined = (tables["dim_user"]
.merge(tables["dim_company"][["company_id", "company_name", "tier"]],
on="company_id"))
joined.head(8)
Where to next¶
- Schema and dimensions —
schema_and_dimensions.ipynbcovers the full column-type vocabulary including SCD type 2. - DE use cases —
de_use_cases.ipynbshows how bridge cardinality assertions feed into pipeline tests. - Schema guide —
docs/site/user-guide/schema-guide.mdcovers bridges, sub-entity dim counts, andpool.{attr}columns.