Skip to content

rwerner0615/synthetic-data-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Synthetic Data Generator (Python)

This project generates synthetic data from scratch based on constraints and simple rules without requiring an original dataset.

I built this because in analytics, market research, and capstone projects, you often need realistic-looking data to prototype analysis, dashboards, or workflows, but can’t use real or proprietary data.

What this tool does

  • Generates CSV datasets from a YAML specification
  • Supports numeric ranges, categories, dates, and IDs
  • Allows rule-based dependencies between fields
  • Works for any topic (not tied to a specific domain)

The generator doesn’t assume anything about the data’s meaning it just follows the structure you define.

How it works (conceptually)

  1. You describe the dataset shape in a YAML file (columns, bounds, rules)
  2. The generator creates base values within those bounds
  3. Rules override values where conditions are met
  4. The result is written to a CSV file

Why YAML

YAML keeps the data definition readable and easy to change without editing Python code. Most changes happen in the spec, not the generator.

Limitations

  • Rule-based dependencies (not statistical modeling)
  • No guarantee of real-world distributions
  • Optimized for clarity and flexibility, not scale

Example use cases

  • Capstone projects
  • Analytics prototyping
  • Market research simulations
  • Synthetic datasets for dashboards or demos

How to run

python synthgen.py

About

Python tool for generating synthetic CSV data from YAML-defined constraints and rule-based dependencies

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages