Skip to content

JEMathew/data-platform-modernization-agent

Repository files navigation

Data Platform Modernization Agent

Agentic AI for legacy-to-cloud data platform modernization. An agent pipeline migrates enterprises off legacy systems and onto any modern cloud warehouse — with a human approving every AI-generated change.

Status: Working prototype / demo-stage · mock data · solo build · AIBoomi Startup Weekend

Live demo: https://jemathew.github.io/data-platform-modernization-agent/ · Demo video: {DEMO_VIDEO} · Repo: https://github.com/JEMathew/data-platform-modernization-


Overview

Why now. Legacy data platforms (Oracle, Teradata, SQL Server, Hadoop, Informatica) carry fixed cost, can't feed modern AI/analytics, don't scale, depend on a shrinking PL/SQL talent pool, and reach end-of-life — so enterprises must modernize. But ~83% of migrations run over budget or fail, with legacy complexity the #1 cause. A project is usually kicked off by a concrete trigger: a license renewal, an AI mandate, an end-of-life deadline, a capacity wall, M&A, or a stalled prior attempt — and each is a buying moment.

{PRODUCT_NAME} is a unified agent console that runs a pipeline of AI agents — Profiler, Mapper, Code-gen, Validator — to assess a legacy estate, map it to a modern target (Snowflake, BigQuery, Databricks, Fabric), generate the migration code, and validate the result. A deterministic engine does the schema translation; an optional LLM handles the ambiguous procedural code — and a human approves every draft before anything ships.

The idea / why now

End-to-end automated migration already exists — but as services-heavy, six-figure accelerator engagements from specialist vendors. Frontier LLMs now make the hardest part — procedural-code translation — automatable as a self-serve product instead of a consulting project. Our wedge is the product-led experience and the in-app human-in-the-loop approval gate, not "vendor-neutral end-to-end" (which is now table stakes).

Why modernize (drivers). Legacy platforms carry fixed licensing/hardware cost, can't feed modern AI/analytics, don't scale, depend on a shrinking PL/SQL talent pool, slow time-to-insight, and reach end-of-life. Cloud warehouses/lakehouses fix all of these — but ~83% of migrations run over budget or fail, with legacy complexity the #1 cause.

What starts a project now (triggers). A license renewal or hardware refresh, an AI mandate, a cloud-first program, product end-of-life, a performance/capacity wall, M&A consolidation, a regulatory change, attrition of the last engineer who knows the legacy procs, FinOps cost-cutting, or a stalled prior attempt. Each trigger is also a buying moment — exactly when a team goes looking for this tool.

What it does

Step Agent What happens
Assess Profiler Surfaces schema messiness, complexity, and risk on the source
Map Mapper Proposes source→target field mapping (rule-driven, clickable)
Generate Code-gen Transpiles to target DDL, migration SQL, and stored-procedure hand-off (offline engine; optional LLM)
Review — (human gate) Low-confidence output flagged for approve / edit / reject
Validate Validator Deterministic row- and cell-level reconciliation scorecard

Plus console views for ROI, lineage, and AI-readiness.

Tech stack & tools

  • Front end: standalone HTML / CSS / JavaScript agent console (runs offline; no backend required to view).
  • Code-gen engine: a deterministic transpiler runs fully in-browser (no model, no network) — it parses a pasted legacy schema and generates target DDL, migration SQL, type mappings, and confidence-flagged fields in real time, including on inputs it has never seen. Procedural code (PL/SQL, cursors) is detected and routed to the human review gate rather than auto-converted.
  • Optional LLM step: the Code-gen UI accepts an Anthropic API key at runtime; if provided, ambiguous translation is sent to a model (Claude) instead of the offline engine. No key is bundled and none is required to run the demo.
  • Deterministic by design: profiling, orchestration, and validation are rule-based, not model-driven — validation is exact reconciliation, not an LLM guess.
  • Targets: Snowflake, BigQuery, Databricks, Microsoft Fabric. Sources: Oracle, Teradata, SQL Server, Hadoop, Informatica.

What's real vs. roadmap (honest)

Real in the prototype: agent console; Profiler / Mapper / Code-gen / Validator; Assessment, Mapping, Code (DDL/SQL/proc), Review (human-in-the-loop), Validation, AI-readiness; a working offline transpiler that turns pasted legacy DDL into target-cloud code live (optional LLM step if a key is supplied).

Roadmap (not built): auto-discovery across many databases, dependency graph, deployment / cutover, documentation agent, real source/target connectors, ETL-pipeline and BI/report migration, security & access control, pluggable agent registry.

Run it / view it

  1. Clone: git clone {REPO_URL}
  2. Open migration-agent-app.html in a modern browser — the full console runs offline on mock data.
  3. (Optional) To enable live Code-gen, set {ANTHROPIC_API_KEY} per the note in /config, then re-run the Code module.

Demo

  • Video walkthrough: {DEMO_VIDEO}
  • Hosted app: {DEMO_LINK}
  • Screenshots: add console-overview.png, code-gen-live.png, review-gate.png to /docs/img.

Roadmap

Migrate (now) → real connectors + Discovery / dependency graph → deployment & cutover → Documentation & Architecture agents → pluggable agent platform → broader modernization (governance, quality, AI-readiness).

Competitive note

Mature specialists (Next Pathway, LeapLogic) already deliver vendor-neutral, end-to-end, ~95%-automated migration as consulting-led engagements. We don't claim to out-feature them; we compete on a self-serve, product-led experience with a human-in-the-loop gate for teams who can't or won't run a six-figure program.

Author

Jincen E. Mathew — https://www.linkedin.com/in/jincenmathew/


Built for AIBoomi Startup Weekend. Demo runs on synthetic data; no customer or licensed data is used.

About

AI-powered Data Modernization Agent that accelerates enterprise data platform migrations through automated assessment, source-to-target mapping, modernization recommendations and human-in-the-loop governance.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages