GitHub - rocky-data/rocky: The typed graph between your code and whichever warehouse, table format, or query engine you've chosen — typed compiler, branches, replay, column-level lineage, compile-time contracts, per-model cost. Adapters: Databricks, Snowflake, BigQuery, DuckDB. Single static Rust binary. Apache 2.0.

Rocky

Rocky checks your SQL data pipelines and catches problems before they reach your warehouse.

Works with Databricks, Snowflake, BigQuery, and DuckDB. You keep your warehouse and your existing SQL. Apache 2.0.

The failures that cost data teams the most are the quiet ones: a source column type changes and breaks something downstream, a column gets renamed and three models stop working, a query runs fine in dev but fails in prod. Rocky catches all of these at check time, before anything runs.

Try it in 60 seconds

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.ps1 | iex

rocky playground my-first-project
cd my-first-project
rocky compile && rocky test && rocky run

No credentials needed — the playground runs on local DuckDB.

For production deploys, use rocky plan (saves what will change) then rocky apply <plan-id> (runs it). For local work and automation, rocky run does it all in one step.

Who Rocky is for

Built first for data engineers on Databricks where silent failures cost real money and Dagster is the scheduler. Snowflake and BigQuery adapters are in Beta — see Where Rocky is today.

See it in action

Each demo is in examples/playground/pocs/. cd in, run ./run.sh.

See what breaks before you merge, with `rocky lineage-diff`

Compare two versions of your project and get a list of which downstream tables and columns each change affects — ready to paste into a GitHub PR comment.

POC: 06-developer-experience/11-lineage-diff

More demos

Schema drift recovery: source column type changes upstream; Rocky detects it and rebuilds safely.
Data contracts: missing required columns, dropped protected columns, or unsafe type changes surface as errors (E010 / E013) before a row is written.
BigQuery cost to the byte: bytes_scanned in the run receipt matches BigQuery's billing number exactly (requires credentials).
Named branches + replay: run against an isolated schema copy, inspect, then drop or promote.
Column lineage: trace a column in a downstream model back to its source.
Incremental loads: set strategy = "incremental" and Rocky only processes new rows each run.
Data masking: tag PII columns, set masking per environment, fail the check if anything goes out unmasked.
AI model generation: describe what you want; Rocky writes the SQL, checks it, and retries if something's wrong.

In your editor

The checker runs as a language server in VS Code, so you see type mismatches and broken references while you write, not in CI. Column types show on hover, go-to-definition works across all your models.

The Rocky Inspector shows a model's columns, where each came from, its tests, cost, and which columns hold sensitive data.

Install the VS Code extension →

Where Rocky is today

Core features are production-ready on Databricks: the checker, named branches, replay, column lineage, rule enforcement, per-model cost. Everything else is in progress.

Databricks is the 2026 focus. Snowflake, BigQuery, and Trino work for the core loop but aren't as thorough yet. Talk to us if you need them in production now.
AI features are early. Generate → check → fix is shipped. Mass refactoring, auto-migration on type changes, and assertion generation are on the roadmap.
Iceberg. Reading from a catalog is Beta. Writing straight to Iceberg is planned for 2026.
No built-in metrics layer. Use Cube, the dbt Semantic Layer, or whatever you have.
Dagster is the one built-in scheduler integration (dagster-rocky). For anything else, use the rocky-sdk Python client or rocky serve.

Open a discussion if any of these are a blocker.

How it compares to dbt Core

Problem	dbt Core	Rocky
Source column type changes	Silent	`E013` at check time, blocks PR
Required column disappears	Opt-in `contract: enforced`	`E010` at check time, blocks PR
Column renamed, unknown blast radius	Table-level lineage, post-hoc	`rocky lineage-diff` at PR time, column-level
`SELECT *` pulls an unexpected column	Silent	`P002` warning, downstream models named
Snowflake-only SQL in a Databricks project	No check	`P001` portability warning
Run costs double, no one knows which model	Dig through warehouse history	`cost_summary` per model, every run
Auditor asks what changed `fct_revenue.amount`	Run history, no code record	`rocky replay <run_id>`
Pipeline fails at 3 AM, half already ran	`dbt retry` from failed model	`rocky run --resume-latest`, skips succeeded models

rocky import-dbt converts a vanilla dbt Core project in one command. Rocky also closes the dbt-Core feature gaps teams hit first: deterministic surrogate keys ([[surrogate_key]], the same value dbt_utils.generate_surrogate_key produces on each warehouse), named data-quality tests defined once and reused by name (the analogue of dbt Core's generic tests), and fixture-driven unit tests that mock upstream inputs and assert the output. See the model format reference.

No vendor lock-in. rocky emit-sql renders every transformation model as plain, dependency-ordered SQL, offline with no warehouse connection. It's a one-command export, not a rewrite, so adopting Rocky is never a one-way door. See No lock-in.

In June 2026 dbt Labs released Fusion (dbt Core v2.0, Rust, Apache 2.0, alpha) with SQL type-checking and column lineage, though it still templates with Jinja and safety checks are opt-in. Neither dbt Core v2.0 nor Fusion includes named branches, a code-and-output record per run, per-model cost as a built-in, a cross-database portability check, or declarative masking. Those are in dbt's paid platform; Rocky's are Apache 2.0.

Subprojects

Path	What ships	Language	What it does
`engine/`	`rocky` CLI	Rust	Core engine: SQL checking, drift detection, incremental loads, adapters
`sdk/python/`	`rocky-sdk` (PyPI)	Python	Python client wrapping the CLI, for notebooks and scripts
`integrations/dagster/`	`dagster-rocky` (PyPI)	Python	Dagster resource built on `rocky-sdk`
`editors/vscode/`	Rocky VS Code extension	TypeScript	Live checking, syntax highlighting, AI commands
`examples/playground/`	(config only)	TOML / SQL	Sample DuckDB pipeline, no credentials needed

Adapters

Role	Adapter	Status
Warehouse	Databricks	Production
Warehouse	Snowflake	Beta
Warehouse	BigQuery	Beta
Warehouse	DuckDB	Local / Testing
Warehouse	Trino	Beta
Source	Fivetran	Production
Source	Airbyte	Beta
Source	Iceberg	Beta
Source	Manual	Production

Building a connector for ClickHouse, Redshift, or another warehouse? See the Adapter SDK guide and the skeleton POC.

Building from source

git clone https://github.com/rocky-data/rocky.git
cd rocky
just build   # engine + sdk + dagster + vscode
just test
just lint

See CONTRIBUTING.md for per-subproject build commands.

Releases

Each artifact ships independently via CI-driven tags:

engine-v* → Rocky CLI binary on GitHub Releases (macOS, Linux, Windows)
sdk-v* → rocky-sdk on PyPI
dagster-v* → dagster-rocky on PyPI
vscode-v* → Rocky extension on the VS Code Marketplace

Documentation

Full docs at rocky-data.dev.

New to Rocky? ROCKY_EXPLAINED.md is a plain-English walkthrough of the whole system, with diagrams.

Contributing

See CONTRIBUTING.md. Schema or DSL changes need to update all dependent pieces at once — read the cross-project change guidance before opening a PR.

Sponsoring

Rocky is free and open source. If it saves your team time, consider sponsoring the project.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 958 Commits
.claude/skills		.claude/skills
.git-hooks		.git-hooks
.github		.github
cli-recording		cli-recording
docs		docs
editors/vscode		editors/vscode
engine		engine
examples		examples
integrations/dagster		integrations/dagster
schemas		schemas
scripts		scripts
sdk/python		sdk/python
.gitignore		.gitignore
.taplo.toml		.taplo.toml
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASE_SMOKE.md		RELEASE_SMOKE.md
ROCKY_EXPLAINED.md		ROCKY_EXPLAINED.md
SECURITY.md		SECURITY.md
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Try it in 60 seconds

Who Rocky is for

See it in action

See what breaks before you merge, with `rocky lineage-diff`

More demos

In your editor

Where Rocky is today

How it compares to dbt Core

Subprojects

Adapters

Building from source

Releases

Documentation

Contributing

Sponsoring

License

About

Uh oh!

Releases 195

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Try it in 60 seconds

Who Rocky is for

See it in action

See what breaks before you merge, with rocky lineage-diff

More demos

In your editor

Where Rocky is today

How it compares to dbt Core

Subprojects

Adapters

Building from source

Releases

Documentation

Contributing

Sponsoring

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 195

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

See what breaks before you merge, with `rocky lineage-diff`

Packages