Generate realistic, interconnected, and reproducible test data for finance, healthcare, and beyond.
Faker gives you random data. DATAMIMIC gives you consistent, explainable datasets that respect business logic and domain constraints.
- ๐งฌ Patient medical histories that match age and demographics
- ๐ณ Bank transactions that obey balance constraints
- ๐ก Insurance policies aligned with real risk profiles
Typical data generators produce isolated random values. Thatโs fine for unit tests โ but meaningless for system, analytics, or compliance testing.
# Faker โ broken relationships
patient_name = fake.name()
patient_age = fake.random_int(1, 99)
conditions = [fake.word()]
# "25-year-old with Alzheimer's" โ nonsense data# DATAMIMIC โ contextual realism
from datamimic_ce.domains.healthcare.services import PatientService
patient = PatientService().generate()
print(f"{patient.full_name}, {patient.age}, {patient.conditions}")
# "Shirley Thompson, 72, ['Diabetes', 'Hypertension']"Install and run:
pip install datamimic-ceDATAMIMIC produces the same data for the same request, across machines and CI runs. Seeds, clocks, and UUIDv5 namespaces enforce reproducibility.
from datamimic_ce.domains.facade import generate_domain
request = {
"domain": "person",
"version": "v1",
"count": 1,
"seed": "docs-demo", # identical seed โ identical output
"locale": "en_US",
"clock": "2025-01-01T00:00:00Z" # fixed clock = stable time context
}
response = generate_domain(request)
print(response["items"][0]["id"])
# Same input โ same outputDeterminism Contract
- Inputs:
{seed, clock, uuidv5-namespace, request body} - Guarantees: byte-identical payloads + stable
determinism_proof.content_hash - Scope: all CE domains (see docs for domain-specific caveats)
Run DATAMIMIC as an MCP server so Claude / Cursor (and agents) can call deterministic data tools.
Install
pip install datamimic-ce[mcp]
# Development
pip install -e .[mcp]Run (SSE transport)
export DATAMIMIC_MCP_HOST=127.0.0.1
export DATAMIMIC_MCP_PORT=8765
# Optional auth; clients must send the same token via Authorization: Bearer or X-API-Key
export DATAMIMIC_MCP_API_KEY=changeme
datamimic-mcpIn-proc example (determinism proof)
import anyio, json
from fastmcp.client import Client
from datamimic_ce.mcp.models import GenerateArgs
from datamimic_ce.mcp.server import create_server
async def main():
args = GenerateArgs(domain="person", locale="en_US", seed=42, count=2)
payload = args.model_dump(mode="python")
async with Client(create_server()) as c:
a = await c.call_tool("generate", {"args": payload})
b = await c.call_tool("generate", {"args": payload})
print(json.loads(a[0].text)["determinism_proof"]["content_hash"]
== json.loads(b[0].text)["determinism_proof"]["content_hash"]) # True
anyio.run(main)Config keys
DATAMIMIC_MCP_HOST(default127.0.0.1)DATAMIMIC_MCP_PORT(default8765)DATAMIMIC_MCP_API_KEY(unset = no auth)- Requests over cap (
count > 10_000) are rejected with422.
โก๏ธ Full guide, IDE configs (Claude/Cursor), transports, errors: docs/mcp_quickstart.md
from datamimic_ce.domains.healthcare.services import PatientService
patient = PatientService().generate()
print(patient.full_name, patient.conditions)- Demographically realistic patients
- Doctor specialties match conditions
- Hospital capacities and types
- Longitudinal medical records
from datamimic_ce.domains.finance.services import BankAccountService
account = BankAccountService().generate()
print(account.account_number, account.balance)- Balances respect transaction histories
- Card/IBAN formats per locale
- Distributions tuned for fraud/reconciliation tests
PersonServicewith locale packs (DE / US / VN), versioned and auditable
- Frozen clocks + canonical hashing โ reproducible IDs
- Seeded RNG โ identical outputs across runs
- Schema validation (XSD/JSONSchema) โ structural integrity
- Provenance hashing โ audit-ready lineage
๐ See Developer Guide
Python:
from random import Random
from datamimic_ce.domains.common.models.demographic_config import DemographicConfig
from datamimic_ce.domains.healthcare.services import PatientService
cfg = DemographicConfig(age_min=70, age_max=75)
svc = PatientService(dataset="US", demographic_config=cfg, rng=Random(1337))
print(svc.generate().to_dict())Equivalent XML:
<setup>
<generate name="seeded_seniors" count="3" target="CSV">
<variable name="patient" entity="Patient" dataset="US" ageMin="70" ageMax="75" rngSeed="1337" />
<key name="full_name" script="patient.full_name" />
<key name="age" script="patient.age" />
<array name="conditions" script="patient.conditions" />
</generate>
</setup># Run instant healthcare demo
datamimic demo create healthcare-example
datamimic run ./healthcare-example/datamimic.xml
# Verify version
datamimic versionQuality gates (repo):
make typecheck # mypy --strict
make lint # pylint (โฅ9.0 score target)
make coverage # target โฅ 90%- Core pipeline: Determinism kit โข Domain services โข Schema validators
- Governance layer: Group tables โข Linkage audits โข Provenance hashing
- Execution layer: CLI โข API โข XML runners โข MCP server
| Feature | Community (CE) | Enterprise (EE) |
|---|---|---|
| Deterministic domain generation | โ | โ |
| XML + Python pipelines | โ | โ |
| Healthcare & Finance domains | โ | โ |
| Multi-user collaboration | โ | โ |
| Governance & lineage dashboards | โ | โ |
| ML engines (Mostly AI, Synthcity, โฆ) | โ | โ |
| RBAC & audit logging (HIPAA/GDPR/PCI) | โ | โ |
| EDIFACT / SWIFT adapters | โ | โ |
๐ Compare editions โข Book a strategy call
pip install datamimic-ceGenerate data that makes sense โ deterministically. โญ Star us on GitHub if DATAMIMIC improves your testing workflow.