plotsim¶

Generate multi-table synthetic datasets where the metrics tell a story — not random noise.

Most synthetic data tools generate columns independently. Revenue is random. Engagement is random. Churn is random. The numbers fill a schema, but they don't behave like real data — because in real data, these things move together.

plotsim generates relational test data with shape: every entity follows a behavioral trajectory, and every metric — across every table, every foreign key, every time period — is derived from the same trajectory position. When engagement rises, revenue follows. When it declines, churn fires.

See it¶

The same SaaS schema, generated two ways. One company, twelve months.

Random columns (Faker-style)plotsim (trajectory-correlated)

Every column is independent. The numbers don't agree with each other.

month	engagement	mrr	tickets	churn_risk
2024-01	0.842	$483	7	0.611
2024-02	0.117	$4,201	0	0.043
2024-03	0.674	$1,089	11	0.892
2024-04	0.298	$112	2	0.355
2024-05	0.951	$7,733	4	0.018
2024-06	0.024	$964	9	0.477
2024-07	0.560	$2,154	1	0.802
2024-08	0.405	$328	6	0.220
2024-09	0.789	$617	0	0.998
2024-10	0.131	$5,440	8	0.156
2024-11	0.847	$192	3	0.501
2024-12	0.334	$3,876	12	0.063

Engagement at 0.95 with churn risk at 0.018, then 0.79 at the highest churn risk in the table. There is no story here — only fields filled.

Same schema, generated by plotsim run saas. One real dim_company row, 12 of its 24 monthly rows from fct_engagement, fct_revenue, fct_support_tickets.

month	engagement	mrr	tickets	churn_risk
2024-01	0.587	$1,191	0	0.261
2024-02	0.807	$1,265	1	0.189
2024-03	1.000	$3,532	2	0.129
2024-04	0.593	$818	0	0.171
2024-05	0.904	$3,567	2	0.237
2024-06	0.956	$4,264	1	0.257
2024-07	1.000	$302	2	0.000
2024-08	0.917	$1,507	0	0.000
2024-09	1.000	$890	1	0.000
2024-10	0.783	$512	1	0.264
2024-11	0.956	$837	0	0.000
2024-12	0.827	$351	1	0.248

Engagement is climbing toward its plateau. MRR moves with it. Support tickets stay low. Churn risk stays near zero. All four columns read from the same underlying trajectory position — not from four independent random generators.

The contrast is the entire product.

Install¶

pip install plotsim

Requires Python 3.10+. Zero network calls at generation time. All bundled templates work offline.

Run a template¶

plotsim template saas -o config.yaml
plotsim run config.yaml -o ./output

Or from Python — the builder API is the front door:

from plotsim import create_from_yaml, generate_tables, write_tables

cfg = create_from_yaml("config.yaml")
tables = generate_tables(cfg)
write_tables(tables, cfg)

Either flow produces a complete star schema:

output/
├── dim_date.csv                # complete date spine
├── dim_company.csv             # entity attributes (with SCD2 plan_tier)
├── dim_user.csv                # sub-entity attributes
├── dim_plan.csv                # reference lookup
├── fct_engagement.csv          # entity × period metrics
├── fct_revenue.csv             # entity × period metrics
├── fct_support_tickets.csv     # entity × period metrics
├── evt_login.csv               # proportional events
├── evt_churn.csv               # threshold-triggered events
├── config.yaml                 # frozen copy of the input config
└── validation_report.txt       # FK + PK + spine integrity checks

If a company's engagement trajectory declines, its login events decrease in evt_login.csv and churn events appear in evt_churn.csv — because both tables read from the same underlying trajectory.

Where to go next¶

:material-compass: User guide → Getting started — first dataset on disk in under sixty seconds, then archetypes, metrics, seasonality, schema, output
:material-application: Templates — six bundled domain templates: banking, health, HR, marketing, retail, SaaS
:material-file-code: Reference → Config fields — every input field with type, default, and constraints; companion docs for API, column types, manifest

What plotsim is not¶

Not an ML model trained on real data. plotsim takes a YAML spec; it does not learn from samples.
Not an LLM-driven generator. The engine is deterministic. Same config + same seed = byte-identical output.
Not a Faker replacement. Faker fills cells; plotsim composes coherent multi-table datasets where the cells agree.
Not a privacy tool. Faker output looks realistic but is not anonymized. Treat string columns (names, companies) as synthetic, not safe-to-publish.