Rocky checks your SQL data pipelines and catches problems before they reach your warehouse.
Works with Databricks, Snowflake, BigQuery, and DuckDB. You keep your warehouse and your existing SQL. Apache 2.0.
The failures that cost data teams the most are the quiet ones: a source column type changes and breaks something downstream, a column gets renamed and three models stop working, a query runs fine in dev but fails in prod. Rocky catches all of these at check time, before anything runs.
# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.sh | bash
# Windows (PowerShell)
irm https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.ps1 | iexrocky playground my-first-project
cd my-first-project
rocky compile && rocky test && rocky runNo credentials needed — the playground runs on local DuckDB.
For production deploys, use rocky plan (saves what will change) then rocky apply <plan-id> (runs it). For local work and automation, rocky run does it all in one step.
Built first for data engineers on Databricks where silent failures cost real money and Dagster is the scheduler. Snowflake and BigQuery adapters are in Beta — see Where Rocky is today.
Each demo is in examples/playground/pocs/. cd in, run ./run.sh.
Compare two versions of your project and get a list of which downstream tables and columns each change affects — ready to paste into a GitHub PR comment.
POC: 06-developer-experience/11-lineage-diff
- Schema drift recovery: source column type changes upstream; Rocky detects it and rebuilds safely.
- Data contracts: missing required columns, dropped protected columns, or unsafe type changes surface as errors (
E010/E013) before a row is written. - BigQuery cost to the byte:
bytes_scannedin the run receipt matches BigQuery's billing number exactly (requires credentials). - Named branches + replay: run against an isolated schema copy, inspect, then drop or promote.
- Column lineage: trace a column in a downstream model back to its source.
- Incremental loads: set
strategy = "incremental"and Rocky only processes new rows each run. - Data masking: tag PII columns, set masking per environment, fail the check if anything goes out unmasked.
- AI model generation: describe what you want; Rocky writes the SQL, checks it, and retries if something's wrong.
The checker runs as a language server in VS Code, so you see type mismatches and broken references while you write, not in CI. Column types show on hover, go-to-definition works across all your models.
The Rocky Inspector shows a model's columns, where each came from, its tests, cost, and which columns hold sensitive data.
Install the VS Code extension →
Core features are production-ready on Databricks: the checker, named branches, replay, column lineage, rule enforcement, per-model cost. Everything else is in progress.
- Databricks is the 2026 focus. Snowflake, BigQuery, and Trino work for the core loop but aren't as thorough yet. Talk to us if you need them in production now.
- AI features are early. Generate → check → fix is shipped. Mass refactoring, auto-migration on type changes, and assertion generation are on the roadmap.
- Iceberg. Reading from a catalog is Beta. Writing straight to Iceberg is planned for 2026.
- No built-in metrics layer. Use Cube, the dbt Semantic Layer, or whatever you have.
- Dagster is the one built-in scheduler integration (
dagster-rocky). For anything else, use therocky-sdkPython client orrocky serve.
Open a discussion if any of these are a blocker.
| Problem | dbt Core | Rocky |
|---|---|---|
| Source column type changes | Silent | E013 at check time, blocks PR |
| Required column disappears | Opt-in contract: enforced |
E010 at check time, blocks PR |
| Column renamed, unknown blast radius | Table-level lineage, post-hoc | rocky lineage-diff at PR time, column-level |
SELECT * pulls an unexpected column |
Silent | P002 warning, downstream models named |
| Snowflake-only SQL in a Databricks project | No check | P001 portability warning |
| Run costs double, no one knows which model | Dig through warehouse history | cost_summary per model, every run |
Auditor asks what changed fct_revenue.amount |
Run history, no code record | rocky replay <run_id> |
| Pipeline fails at 3 AM, half already ran | dbt retry from failed model |
rocky run --resume-latest, skips succeeded models |
rocky import-dbt converts a vanilla dbt Core project in one command. Rocky also closes the dbt-Core feature gaps teams hit first: deterministic surrogate keys ([[surrogate_key]], the same value dbt_utils.generate_surrogate_key produces on each warehouse), named data-quality tests defined once and reused by name (the analogue of dbt Core's generic tests), and fixture-driven unit tests that mock upstream inputs and assert the output. See the model format reference.
- No vendor lock-in.
rocky emit-sqlrenders every transformation model as plain, dependency-ordered SQL, offline with no warehouse connection. It's a one-command export, not a rewrite, so adopting Rocky is never a one-way door. See No lock-in.
In June 2026 dbt Labs released Fusion (dbt Core v2.0, Rust, Apache 2.0, alpha) with SQL type-checking and column lineage, though it still templates with Jinja and safety checks are opt-in. Neither dbt Core v2.0 nor Fusion includes named branches, a code-and-output record per run, per-model cost as a built-in, a cross-database portability check, or declarative masking. Those are in dbt's paid platform; Rocky's are Apache 2.0.
| Path | What ships | Language | What it does |
|---|---|---|---|
engine/ |
rocky CLI |
Rust | Core engine: SQL checking, drift detection, incremental loads, adapters |
sdk/python/ |
rocky-sdk (PyPI) |
Python | Python client wrapping the CLI, for notebooks and scripts |
integrations/dagster/ |
dagster-rocky (PyPI) |
Python | Dagster resource built on rocky-sdk |
editors/vscode/ |
Rocky VS Code extension | TypeScript | Live checking, syntax highlighting, AI commands |
examples/playground/ |
(config only) | TOML / SQL | Sample DuckDB pipeline, no credentials needed |
| Role | Adapter | Status |
|---|---|---|
| Warehouse | Databricks | Production |
| Warehouse | Snowflake | Beta |
| Warehouse | BigQuery | Beta |
| Warehouse | DuckDB | Local / Testing |
| Warehouse | Trino | Beta |
| Source | Fivetran | Production |
| Source | Airbyte | Beta |
| Source | Iceberg | Beta |
| Source | Manual | Production |
Building a connector for ClickHouse, Redshift, or another warehouse? See the Adapter SDK guide and the skeleton POC.
git clone https://github.com/rocky-data/rocky.git
cd rocky
just build # engine + sdk + dagster + vscode
just test
just lintSee CONTRIBUTING.md for per-subproject build commands.
Each artifact ships independently via CI-driven tags:
engine-v*→ Rocky CLI binary on GitHub Releases (macOS, Linux, Windows)sdk-v*→rocky-sdkon PyPIdagster-v*→dagster-rockyon PyPIvscode-v*→ Rocky extension on the VS Code Marketplace
Full docs at rocky-data.dev.
New to Rocky? ROCKY_EXPLAINED.md is a plain-English walkthrough of the whole system, with diagrams.
See CONTRIBUTING.md. Schema or DSL changes need to update all dependent pieces at once — read the cross-project change guidance before opening a PR.
Rocky is free and open source. If it saves your team time, consider sponsoring the project.


