Skip to content

rocky-data/rocky

Rocky

Engine CI SDK CI Dagster CI VS Code CI License: Apache 2.0

Rocky checks your SQL data pipelines and catches problems before they reach your warehouse.

Works with Databricks, Snowflake, BigQuery, and DuckDB. You keep your warehouse and your existing SQL. Apache 2.0.

The failures that cost data teams the most are the quiet ones: a source column type changes and breaks something downstream, a column gets renamed and three models stop working, a query runs fine in dev but fails in prod. Rocky catches all of these at check time, before anything runs.

Rocky quickstart: create a project, compile, and run 3 models in under 15s

Try it in 60 seconds

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.ps1 | iex
rocky playground my-first-project
cd my-first-project
rocky compile && rocky test && rocky run

No credentials needed — the playground runs on local DuckDB.

For production deploys, use rocky plan (saves what will change) then rocky apply <plan-id> (runs it). For local work and automation, rocky run does it all in one step.

Who Rocky is for

Built first for data engineers on Databricks where silent failures cost real money and Dagster is the scheduler. Snowflake and BigQuery adapters are in Beta — see Where Rocky is today.

See it in action

Each demo is in examples/playground/pocs/. cd in, run ./run.sh.

See what breaks before you merge, with rocky lineage-diff

Compare two versions of your project and get a list of which downstream tables and columns each change affects — ready to paste into a GitHub PR comment.

rocky lineage-diff main lists added and removed columns across two models with downstream consumers per change

POC: 06-developer-experience/11-lineage-diff

More demos

  • Schema drift recovery: source column type changes upstream; Rocky detects it and rebuilds safely.
  • Data contracts: missing required columns, dropped protected columns, or unsafe type changes surface as errors (E010 / E013) before a row is written.
  • BigQuery cost to the byte: bytes_scanned in the run receipt matches BigQuery's billing number exactly (requires credentials).
  • Named branches + replay: run against an isolated schema copy, inspect, then drop or promote.
  • Column lineage: trace a column in a downstream model back to its source.
  • Incremental loads: set strategy = "incremental" and Rocky only processes new rows each run.
  • Data masking: tag PII columns, set masking per environment, fail the check if anything goes out unmasked.
  • AI model generation: describe what you want; Rocky writes the SQL, checks it, and retries if something's wrong.

In your editor

The checker runs as a language server in VS Code, so you see type mismatches and broken references while you write, not in CI. Column types show on hover, go-to-definition works across all your models.

The Rocky Inspector shows a model's columns, where each came from, its tests, cost, and which columns hold sensitive data.

The Rocky Inspector's Overview as a model trust dashboard, its Governance card flagging two classified columns with one left unmasked

Install the VS Code extension →

Where Rocky is today

Core features are production-ready on Databricks: the checker, named branches, replay, column lineage, rule enforcement, per-model cost. Everything else is in progress.

  • Databricks is the 2026 focus. Snowflake, BigQuery, and Trino work for the core loop but aren't as thorough yet. Talk to us if you need them in production now.
  • AI features are early. Generate → check → fix is shipped. Mass refactoring, auto-migration on type changes, and assertion generation are on the roadmap.
  • Iceberg. Reading from a catalog is Beta. Writing straight to Iceberg is planned for 2026.
  • No built-in metrics layer. Use Cube, the dbt Semantic Layer, or whatever you have.
  • Dagster is the one built-in scheduler integration (dagster-rocky). For anything else, use the rocky-sdk Python client or rocky serve.

Open a discussion if any of these are a blocker.

How it compares to dbt Core

Problem dbt Core Rocky
Source column type changes Silent E013 at check time, blocks PR
Required column disappears Opt-in contract: enforced E010 at check time, blocks PR
Column renamed, unknown blast radius Table-level lineage, post-hoc rocky lineage-diff at PR time, column-level
SELECT * pulls an unexpected column Silent P002 warning, downstream models named
Snowflake-only SQL in a Databricks project No check P001 portability warning
Run costs double, no one knows which model Dig through warehouse history cost_summary per model, every run
Auditor asks what changed fct_revenue.amount Run history, no code record rocky replay <run_id>
Pipeline fails at 3 AM, half already ran dbt retry from failed model rocky run --resume-latest, skips succeeded models

rocky import-dbt converts a vanilla dbt Core project in one command. Rocky also closes the dbt-Core feature gaps teams hit first: deterministic surrogate keys ([[surrogate_key]], the same value dbt_utils.generate_surrogate_key produces on each warehouse), named data-quality tests defined once and reused by name (the analogue of dbt Core's generic tests), and fixture-driven unit tests that mock upstream inputs and assert the output. See the model format reference.

  • No vendor lock-in. rocky emit-sql renders every transformation model as plain, dependency-ordered SQL, offline with no warehouse connection. It's a one-command export, not a rewrite, so adopting Rocky is never a one-way door. See No lock-in.

In June 2026 dbt Labs released Fusion (dbt Core v2.0, Rust, Apache 2.0, alpha) with SQL type-checking and column lineage, though it still templates with Jinja and safety checks are opt-in. Neither dbt Core v2.0 nor Fusion includes named branches, a code-and-output record per run, per-model cost as a built-in, a cross-database portability check, or declarative masking. Those are in dbt's paid platform; Rocky's are Apache 2.0.

Subprojects

Path What ships Language What it does
engine/ rocky CLI Rust Core engine: SQL checking, drift detection, incremental loads, adapters
sdk/python/ rocky-sdk (PyPI) Python Python client wrapping the CLI, for notebooks and scripts
integrations/dagster/ dagster-rocky (PyPI) Python Dagster resource built on rocky-sdk
editors/vscode/ Rocky VS Code extension TypeScript Live checking, syntax highlighting, AI commands
examples/playground/ (config only) TOML / SQL Sample DuckDB pipeline, no credentials needed

Adapters

Role Adapter Status
Warehouse Databricks Production
Warehouse Snowflake Beta
Warehouse BigQuery Beta
Warehouse DuckDB Local / Testing
Warehouse Trino Beta
Source Fivetran Production
Source Airbyte Beta
Source Iceberg Beta
Source Manual Production

Building a connector for ClickHouse, Redshift, or another warehouse? See the Adapter SDK guide and the skeleton POC.

Building from source

git clone https://github.com/rocky-data/rocky.git
cd rocky
just build   # engine + sdk + dagster + vscode
just test
just lint

See CONTRIBUTING.md for per-subproject build commands.

Releases

Each artifact ships independently via CI-driven tags:

  • engine-v* → Rocky CLI binary on GitHub Releases (macOS, Linux, Windows)
  • sdk-v*rocky-sdk on PyPI
  • dagster-v*dagster-rocky on PyPI
  • vscode-v* → Rocky extension on the VS Code Marketplace

Documentation

Full docs at rocky-data.dev.

New to Rocky? ROCKY_EXPLAINED.md is a plain-English walkthrough of the whole system, with diagrams.

Contributing

See CONTRIBUTING.md. Schema or DSL changes need to update all dependent pieces at once — read the cross-project change guidance before opening a PR.

Sponsoring

Rocky is free and open source. If it saves your team time, consider sponsoring the project.

License

Apache 2.0

About

The typed graph between your code and whichever warehouse, table format, or query engine you've chosen — typed compiler, branches, replay, column-level lineage, compile-time contracts, per-model cost. Adapters: Databricks, Snowflake, BigQuery, DuckDB. Single static Rust binary. Apache 2.0.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

Contributors