Bridges and advanced schema features¶

Bridge tables — many-to-many relationships between two dim tables, with optional cardinality bounds.
pool.{attr} columns — surface a segment-attribute as a column on the entity dim.
Sub-entity dims — dim.count = N produces N rows per parent entity (employees per company, users per account).

1. Bridge tables — many-to-many between dims¶

A bridge is its own table — one row per (left_entity, right_entity) pair. cardinality: (min, max) bounds how many right entities each left entity associates with. The engine samples uniformly within the bounds; pass driver: <metric> to bias selection by trajectory position.

In [ ]:

Copied!





import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from plotsim import create, generate_tables

cfg_bridge = create(
    about="Bridge demo",
    unit="company",
    window=("2024-01", "2024-12", "monthly"),
    metrics=[
        {"name": "revenue", "type": "amount", "polarity": "positive",
         "range": [1000, 100000]},
    ],
    segments=[
        {"name": "growth_co",  "count": 10, "archetype": "growth"},
        {"name": "steady_co",  "count": 10, "archetype": "flat"},
    ],
    dimensions=[
        {"name": "dim_date", "per": "period", "columns": [
            {"name": "date_key", "type": "id"},
            {"name": "date",     "type": "date"},
            {"name": "year",     "type": "int"},
            {"name": "month",    "type": "int"},
        ]},
        {"name": "dim_company", "per": "unit", "columns": [
            {"name": "company_id",   "type": "id"},
            {"name": "company_name", "type": "faker.company"},
        ]},
        # Sub-entity dim — 4 users per company. Used here as the bridge's
        # right-side dim. Both sides of a bridge must be declared dims with
        # enough rows to satisfy `cardinality.max`.
        {"name": "dim_user", "per": "unit", "count": 4, "columns": [
            {"name": "user_id",    "type": "id"},
            {"name": "company_id", "type": "ref.dim_company"},
            {"name": "user_name",  "type": "faker.name"},
        ]},
    ],
    bridges=[
        # Each company associates with 1..3 user records (M:N — beyond the
        # primary `company_id` FK on dim_user, this models extra access /
        # cross-company assignments).
        {"name": "bridge_company_user",
         "left": "dim_company", "right": "dim_user",
         "cardinality": (1, 3),
         "driver": "revenue",
         "columns": [
             {"name": "weight", "type": "metric.revenue"},
         ]},
    ],
)
tables = generate_tables(cfg_bridge, np.random.default_rng(cfg_bridge.seed))
print(f"Tables: {sorted(tables)}")
print(f"\nbridge_company_user: {len(tables['bridge_company_user'])} rows")
tables["bridge_company_user"].head()
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from plotsim import create, generate_tables

cfg_bridge = create(
    about="Bridge demo",
    unit="company",
    window=("2024-01", "2024-12", "monthly"),
    metrics=[
        {"name": "revenue", "type": "amount", "polarity": "positive",
         "range": [1000, 100000]},
    ],
    segments=[
        {"name": "growth_co",  "count": 10, "archetype": "growth"},
        {"name": "steady_co",  "count": 10, "archetype": "flat"},
    ],
    dimensions=[
        {"name": "dim_date", "per": "period", "columns": [
            {"name": "date_key", "type": "id"},
            {"name": "date",     "type": "date"},
            {"name": "year",     "type": "int"},
            {"name": "month",    "type": "int"},
        ]},
        {"name": "dim_company", "per": "unit", "columns": [
            {"name": "company_id",   "type": "id"},
            {"name": "company_name", "type": "faker.company"},
        ]},
        # Sub-entity dim — 4 users per company. Used here as the bridge's
        # right-side dim. Both sides of a bridge must be declared dims with
        # enough rows to satisfy `cardinality.max`.
        {"name": "dim_user", "per": "unit", "count": 4, "columns": [
            {"name": "user_id",    "type": "id"},
            {"name": "company_id", "type": "ref.dim_company"},
            {"name": "user_name",  "type": "faker.name"},
        ]},
    ],
    bridges=[
        # Each company associates with 1..3 user records (M:N — beyond the
        # primary `company_id` FK on dim_user, this models extra access /
        # cross-company assignments).
        {"name": "bridge_company_user",
         "left": "dim_company", "right": "dim_user",
         "cardinality": (1, 3),
         "driver": "revenue",
         "columns": [
             {"name": "weight", "type": "metric.revenue"},
         ]},
    ],
)
tables = generate_tables(cfg_bridge, np.random.default_rng(cfg_bridge.seed))
print(f"Tables: {sorted(tables)}")
print(f"\nbridge_company_user: {len(tables['bridge_company_user'])} rows")
tables["bridge_company_user"].head()

In [ ]:

Copied!





# Cardinality distribution — how many users per company?
counts = (
    tables["bridge_company_user"]
    .groupby("company_id")
    .size()
    .value_counts()
    .sort_index()
)
counts.plot(kind="bar", figsize=(7, 3),
            title="Cardinality — users per company in the bridge (configured 1..3)")
plt.xlabel("# users"); plt.ylabel("# companies")
plt.tight_layout(); plt.show()
# Cardinality distribution — how many users per company?
counts = (
    tables["bridge_company_user"]
    .groupby("company_id")
    .size()
    .value_counts()
    .sort_index()
)
counts.plot(kind="bar", figsize=(7, 3),
            title="Cardinality — users per company in the bridge (configured 1..3)")
plt.xlabel("# users"); plt.ylabel("# companies")
plt.tight_layout(); plt.show()

2. `pool.{attr}` — segment attributes as dim columns¶

Every segment can carry arbitrary attributes (single values or lists). When a dim column declares type: "pool.<attr_name>", the engine populates that column from the entity's segment attribute. Lists get sampled per entity; scalars repeat.

In [ ]:

Copied!





cfg_pool = create(
    about="pool.{attr} demo",
    unit="customer",
    window=("2024-01", "2024-12", "monthly"),
    metrics=[
        {"name": "spend", "type": "amount", "polarity": "positive",
         "range": [10, 500]},
    ],
    segments=[
        {"name": "enterprise", "count": 15, "archetype": "growth",
         "attributes": {
             "tier":   "enterprise",
             "region": ["US", "EMEA", "APAC"],
             "plan":   ["pro", "enterprise"],
         }},
        {"name": "starter", "count": 15, "archetype": "flat",
         "attributes": {
             "tier":   "starter",
             "region": ["US", "EMEA"],
             "plan":   "free",
         }},
    ],
    dimensions=[
        {"name": "dim_date", "per": "period", "columns": [
            {"name": "date_key", "type": "id"},
            {"name": "date",     "type": "date"},
            {"name": "year",     "type": "int"},
            {"name": "month",    "type": "int"},
        ]},
        {"name": "dim_customer", "per": "unit", "columns": [
            {"name": "customer_id", "type": "id"},
            {"name": "customer_name", "type": "faker.name"},
            {"name": "tier",        "type": "pool.tier"},
            {"name": "region",      "type": "pool.region"},
            {"name": "plan",        "type": "pool.plan"},
            {"name": "cohort_size", "type": "segment.count"},
        ]},
    ],
)
tables = generate_tables(cfg_pool, np.random.default_rng(cfg_pool.seed))
dim = tables["dim_customer"]
print(f"dim_customer columns: {list(dim.columns)}")
print(f"\nRegion distribution per tier:")
dim.groupby("tier")["region"].value_counts()
cfg_pool = create(
    about="pool.{attr} demo",
    unit="customer",
    window=("2024-01", "2024-12", "monthly"),
    metrics=[
        {"name": "spend", "type": "amount", "polarity": "positive",
         "range": [10, 500]},
    ],
    segments=[
        {"name": "enterprise", "count": 15, "archetype": "growth",
         "attributes": {
             "tier":   "enterprise",
             "region": ["US", "EMEA", "APAC"],
             "plan":   ["pro", "enterprise"],
         }},
        {"name": "starter", "count": 15, "archetype": "flat",
         "attributes": {
             "tier":   "starter",
             "region": ["US", "EMEA"],
             "plan":   "free",
         }},
    ],
    dimensions=[
        {"name": "dim_date", "per": "period", "columns": [
            {"name": "date_key", "type": "id"},
            {"name": "date",     "type": "date"},
            {"name": "year",     "type": "int"},
            {"name": "month",    "type": "int"},
        ]},
        {"name": "dim_customer", "per": "unit", "columns": [
            {"name": "customer_id", "type": "id"},
            {"name": "customer_name", "type": "faker.name"},
            {"name": "tier",        "type": "pool.tier"},
            {"name": "region",      "type": "pool.region"},
            {"name": "plan",        "type": "pool.plan"},
            {"name": "cohort_size", "type": "segment.count"},
        ]},
    ],
)
tables = generate_tables(cfg_pool, np.random.default_rng(cfg_pool.seed))
dim = tables["dim_customer"]
print(f"dim_customer columns: {list(dim.columns)}")
print(f"\nRegion distribution per tier:")
dim.groupby("tier")["region"].value_counts()

3. Sub-entity dims — `count: N` for hierarchies¶

Set count: N on a per: unit dim and the engine emits N rows per parent entity. This is how you model 1:N hierarchies — employees within companies, users within accounts, devices within households — without writing a separate generation step.

In [ ]:

Copied!





cfg_sub = create(
    about="Sub-entity dim demo",
    unit="company",
    window=("2024-01", "2024-12", "monthly"),
    metrics=[
        {"name": "revenue", "type": "amount", "polarity": "positive",
         "range": [1000, 100000]},
    ],
    segments=[
        {"name": "smb",     "count": 5, "archetype": "growth",
         "attributes": {"tier": "smb"}},
        {"name": "midmarket","count": 5, "archetype": "growth",
         "attributes": {"tier": "midmarket"}},
    ],
    dimensions=[
        {"name": "dim_date", "per": "period", "columns": [
            {"name": "date_key", "type": "id"},
            {"name": "date",     "type": "date"},
            {"name": "year",     "type": "int"},
            {"name": "month",    "type": "int"},
        ]},
        {"name": "dim_company", "per": "unit", "columns": [
            {"name": "company_id",   "type": "id"},
            {"name": "company_name", "type": "faker.company"},
            {"name": "tier",         "type": "pool.tier"},
        ]},
        {"name": "dim_user", "per": "unit", "count": 5, "columns": [
            {"name": "user_id",    "type": "id"},
            {"name": "company_id", "type": "ref.dim_company"},
            {"name": "user_name",  "type": "faker.name"},
            {"name": "role",       "type": "static.member"},
        ]},
    ],
)
tables = generate_tables(cfg_sub, np.random.default_rng(cfg_sub.seed))
print(f"dim_company: {len(tables['dim_company'])} rows")
print(f"dim_user:    {len(tables['dim_user'])} rows  "
      f"({len(tables['dim_user']) // len(tables['dim_company'])} per company)")

# Show the parent-child relationship
joined = (tables["dim_user"]
          .merge(tables["dim_company"][["company_id", "company_name", "tier"]],
                 on="company_id"))
joined.head(8)
cfg_sub = create(
    about="Sub-entity dim demo",
    unit="company",
    window=("2024-01", "2024-12", "monthly"),
    metrics=[
        {"name": "revenue", "type": "amount", "polarity": "positive",
         "range": [1000, 100000]},
    ],
    segments=[
        {"name": "smb",     "count": 5, "archetype": "growth",
         "attributes": {"tier": "smb"}},
        {"name": "midmarket","count": 5, "archetype": "growth",
         "attributes": {"tier": "midmarket"}},
    ],
    dimensions=[
        {"name": "dim_date", "per": "period", "columns": [
            {"name": "date_key", "type": "id"},
            {"name": "date",     "type": "date"},
            {"name": "year",     "type": "int"},
            {"name": "month",    "type": "int"},
        ]},
        {"name": "dim_company", "per": "unit", "columns": [
            {"name": "company_id",   "type": "id"},
            {"name": "company_name", "type": "faker.company"},
            {"name": "tier",         "type": "pool.tier"},
        ]},
        {"name": "dim_user", "per": "unit", "count": 5, "columns": [
            {"name": "user_id",    "type": "id"},
            {"name": "company_id", "type": "ref.dim_company"},
            {"name": "user_name",  "type": "faker.name"},
            {"name": "role",       "type": "static.member"},
        ]},
    ],
)
tables = generate_tables(cfg_sub, np.random.default_rng(cfg_sub.seed))
print(f"dim_company: {len(tables['dim_company'])} rows")
print(f"dim_user:    {len(tables['dim_user'])} rows  "
      f"({len(tables['dim_user']) // len(tables['dim_company'])} per company)")

# Show the parent-child relationship
joined = (tables["dim_user"]
          .merge(tables["dim_company"][["company_id", "company_name", "tier"]],
                 on="company_id"))
joined.head(8)

Where to next¶

Schema and dimensions — schema_and_dimensions.ipynb covers the full column-type vocabulary including SCD type 2.
DE use cases — de_use_cases.ipynb shows how bridge cardinality assertions feed into pipeline tests.
Schema guide — docs/site/user-guide/schema-guide.md covers bridges, sub-entity dim counts, and pool.{attr} columns.

Bridges and advanced schema features¶

1. Bridge tables — many-to-many between dims¶

2. pool.{attr} — segment attributes as dim columns¶

3. Sub-entity dims — count: N for hierarchies¶

Where to next¶

2. `pool.{attr}` — segment attributes as dim columns¶

3. Sub-entity dims — `count: N` for hierarchies¶