plotsim¶
Generate multi-table synthetic datasets where the metrics tell a story — not random noise.
Most synthetic data tools generate columns independently. Revenue is random. Engagement is random. Churn is random. The numbers fill a schema, but they don't behave like real data — because in real data, these things move together.
plotsim generates relational test data with shape: every entity follows a behavioral trajectory, and every metric — across every table, every foreign key, every time period — is derived from the same trajectory position. When engagement rises, revenue follows. When it declines, churn fires.
See it¶
The same SaaS schema, generated two ways. One company, twelve months.
Every column is independent. The numbers don't agree with each other.
| month | engagement | mrr | tickets | churn_risk |
|---|---|---|---|---|
| 2024-01 | 0.842 | $483 | 7 | 0.611 |
| 2024-02 | 0.117 | $4,201 | 0 | 0.043 |
| 2024-03 | 0.674 | $1,089 | 11 | 0.892 |
| 2024-04 | 0.298 | $112 | 2 | 0.355 |
| 2024-05 | 0.951 | $7,733 | 4 | 0.018 |
| 2024-06 | 0.024 | $964 | 9 | 0.477 |
| 2024-07 | 0.560 | $2,154 | 1 | 0.802 |
| 2024-08 | 0.405 | $328 | 6 | 0.220 |
| 2024-09 | 0.789 | $617 | 0 | 0.998 |
| 2024-10 | 0.131 | $5,440 | 8 | 0.156 |
| 2024-11 | 0.847 | $192 | 3 | 0.501 |
| 2024-12 | 0.334 | $3,876 | 12 | 0.063 |
Engagement at 0.95 with churn risk at 0.018, then 0.79 at the highest churn risk in the table. There is no story here — only fields filled.
Same schema, generated by plotsim run saas. One real dim_company row, 12 of its 24 monthly rows from fct_engagement, fct_revenue, fct_support_tickets.
| month | engagement | mrr | tickets | churn_risk |
|---|---|---|---|---|
| 2024-01 | 0.587 | $1,191 | 0 | 0.261 |
| 2024-02 | 0.807 | $1,265 | 1 | 0.189 |
| 2024-03 | 1.000 | $3,532 | 2 | 0.129 |
| 2024-04 | 0.593 | $818 | 0 | 0.171 |
| 2024-05 | 0.904 | $3,567 | 2 | 0.237 |
| 2024-06 | 0.956 | $4,264 | 1 | 0.257 |
| 2024-07 | 1.000 | $302 | 2 | 0.000 |
| 2024-08 | 0.917 | $1,507 | 0 | 0.000 |
| 2024-09 | 1.000 | $890 | 1 | 0.000 |
| 2024-10 | 0.783 | $512 | 1 | 0.264 |
| 2024-11 | 0.956 | $837 | 0 | 0.000 |
| 2024-12 | 0.827 | $351 | 1 | 0.248 |
Engagement is climbing toward its plateau. MRR moves with it. Support tickets stay low. Churn risk stays near zero. All four columns read from the same underlying trajectory position — not from four independent random generators.
The contrast is the entire product.
Install¶
Requires Python 3.10+. Zero network calls at generation time. All bundled templates work offline.
Run a template¶
Or from Python — the builder API is the front door:
from plotsim import create_from_yaml, generate_tables, write_tables
cfg = create_from_yaml("config.yaml")
tables = generate_tables(cfg)
write_tables(tables, cfg)
Either flow produces a complete star schema:
output/
├── dim_date.csv # complete date spine
├── dim_company.csv # entity attributes (with SCD2 plan_tier)
├── dim_user.csv # sub-entity attributes
├── dim_plan.csv # reference lookup
├── fct_engagement.csv # entity × period metrics
├── fct_revenue.csv # entity × period metrics
├── fct_support_tickets.csv # entity × period metrics
├── evt_login.csv # proportional events
├── evt_churn.csv # threshold-triggered events
├── config.yaml # frozen copy of the input config
└── validation_report.txt # FK + PK + spine integrity checks
If a company's engagement trajectory declines, its login events decrease in evt_login.csv and churn events appear in evt_churn.csv — because both tables read from the same underlying trajectory.
Where to go next¶
- :material-compass: User guide → How it works — the mental model in five minutes, then archetypes, metrics, seasonality, schema, output
- :material-notebook: Tutorials — runnable notebooks covering each feature surface
- :material-file-code: Reference → Config fields — every input field with type, default, and constraints; companion docs for API, column types, manifest
What plotsim is not¶
- Not an ML model trained on real data. plotsim takes a YAML spec; it does not learn from samples.
- Not an LLM-driven generator. The engine is deterministic. Same config + same seed = byte-identical output.
- Not a Faker replacement. Faker fills cells; plotsim composes coherent multi-table datasets where the cells agree.
- Not a privacy tool. Faker output looks realistic but is not anonymized. Treat string columns (names, companies) as synthetic, not safe-to-publish.