Cross-agent guide (Claude, Antigravity, Gemini, Cursor, Copilot, …) for authoring sqlanvil
data projects (.sqlx models, workflow_settings.yaml, .df-credentials.json) that target
PostgreSQL / Supabase / MySQL / MariaDB.
Working on the sqlanvil codebase itself (TypeScript, Bazel, protos)? See
CLAUDE.md. This file is about writing projects that sqlanvil compiles and runs. Canonical copy of this guidance also ships as the Claude skillsqlanvil-engineering-fundamentals— keep the two in sync.
sqlanvil is a fork of Dataform repositioned for PostgreSQL, Supabase, and MySQL/MariaDB. Your
Dataform/BigQuery instincts are mostly right — .sqlx files, config {}, ${ref()},
${self()}, declarations, assertions, tags, incremental tables, pre_operations/post_operations
all work the same. This file is the delta: where assuming Dataform/BigQuery produces broken
sqlanvil code.
Core rule: on Postgres/Supabase, reach for the postgres: {} config block and idiomatic
Postgres — never BigQuery options or hand-rolled DDL workarounds. MySQL/MariaDB is different —
smaller surface (no options block, no matviews) and several rules invert; see the MySQL section
below before authoring a warehouse: mysql project.
Source of truth for config fields: protos/configs.proto (PostgresOptions,
PostgresConnection, TableConfig, IncrementalTableConfig, ViewConfig). When unsure, read it.
warehouse: postgres # flat string ("postgres" or "supabase") — NOT nested
defaultDataset: public # the Postgres SCHEMA
defaultAssertionDataset: sqlanvil_assertions
sqlanvilCoreVersion: 1.1.1 # sqlanvil's OWN SemVer line (NOT dataformCoreVersion); 1.1.1+ for named connections
vars:
someVar: valueDrop defaultProject and defaultLocation (BigQuery-only). dataform.json is the legacy
upstream name; sqlanvil uses workflow_settings.yaml.
Filename is literally .df-credentials.json (fork kept upstream's name; default, override with
--credentials). Flat PostgresConnection — not nested under "postgres", not the
BigQuery shape:
{
"host": "localhost",
"port": 5432,
"database": "postgres",
"user": "postgres",
"password": "password",
"sslMode": "require",
"defaultSchema": "public"
}Exact field names: host port database user password sslMode defaultSchema — not
username/databaseName/ssl. Gitignore it. Supabase: host: db.<ref>.supabase.co, port 5432
(or 6543 pooler), sslMode: "require".
Use postgres: {} on type: "table" and type: "incremental". Do not create indexes or set
fillfactor via post_operations.
config {
type: "table",
postgres: {
fillfactor: 80,
unlogged: false,
tablespace: "fast_ssd",
indexes: [
{ name: "idx_email", columns: ["email"], unique: true },
{ name: "idx_props", columns: ["props"], method: 2, opclass: "jsonb_path_ops" }
]
}
}
Index method is a NUMERIC ENUM, not a string: BTREE=0, HASH=1, GIN=2, GIST=3, BRIN=4.
method: "btree" fails the config type check (parser uses protobufjs create()). Omit for btree.
Index fields: name, columns[](array), method(int), where(partial predicate),
unique(bool), include[](array, covering), opclass(single string applied to every indexed
column — opclass: "gin_trgm_ops", not an array).
postgres: {
partition: {
kind: 0, // RANGE=0, LIST=1, HASH=2 (numeric enum)
columns: ["order_date"],
partitions: [
{ name: "y2024", values: "FROM ('2024-01-01') TO ('2025-01-01')" }
],
includeDefault: true
}
}
values is the raw FOR VALUES body matching kind. No clusterBy — use indexes.
Emits CREATE MATERIALIZED VIEW. Default = drop + recreate every run (also picks up
definition changes). Set Postgres options directly in the view config via postgres: {}:
config {
type: "view",
materialized: true,
postgres: {
refreshPolicy: "on_dependency_change", // in-place REFRESH instead of drop+recreate
noData: true, // WITH NO DATA (empty until first refresh)
indexes: [{ name: "idx_mv_id", columns: ["id"], unique: true }]
}
}
refreshPolicy: "on_dependency_change" runs REFRESH MATERIALIZED VIEW in place — but in-place
refresh does not pick up definition (SQL) changes; omit it for the safe drop+recreate default.
Three dashes on their own line separate statements in operations/pre_operations/
post_operations. sqlanvil never splits on ;, so PL/pgSQL $$ ... ; ... $$ bodies survive. A
table/view body is exactly one SELECT — no ---.
No dedicated action type; the operations generator is dialect-agnostic.
config { type: "operations", hasOutput: false }
CREATE OR REPLACE FUNCTION marts.recalc() RETURNS void LANGUAGE plpgsql AS $$
BEGIN
-- body; semicolons here are fine
END;
$$
---
CALL marts.recalc()
- Top-level
uniqueKey: ["id"]on an incremental = upsert/merge key (INSERT ... ON CONFLICT). Controls incremental writes. assertions: { uniqueKey: ["id"] }= generates a uniqueness assertion. A quality check.
Independent; can coexist.
An unwrapped pre_operations/post_operations block runs on every run of an incremental —
the create and every append (it compiles into both the create-path and incremental-path ops). A
bare ALTER TABLE ... ADD PRIMARY KEY errors on the second run (multiple primary keys). Wrap
one-time DDL so it runs only on create + --full-refresh:
post_operations {
${when(!incremental(), `ALTER TABLE ${self()} ADD PRIMARY KEY (order_date)`)}
}
(The PK also gives the incremental upsert its ON CONFLICT target.) Matview post-ops re-run every
build — keep them idempotent (matviews can't have PKs anyway).
description:(table) +columns: { col: "..." }(per-column) →COMMENT ON TABLE|VIEW|MATERIALIZED VIEW|COLUMN. Document every table.assertions: { uniqueKey, uniqueKeys: [["a","b"],["c"]], nonNull: [...], rowConditions: [...] }. Standalone =type: "assertion"whoseSELECTreturns offending rows (passes iff zero rows).
No bigquery: {}, partitionBy, clusterBy, OPTIONS(...), bigqueryPolicyTags, backticked
`project.dataset.table`, or CREATE ... NOT ENFORCED. Use the postgres: equivalents and
double-quoted identifiers.
sqlanvil init <projectDir> --warehouse postgres # or supabase — scaffolds workflow_settings.yaml + a .df-credentials.json template (BigQuery is the default; it needs a GCP project + location)
sqlanvil compile <projectDir>
sqlanvil run <projectDir> --credentials <projectDir>/.df-credentials.json
sqlanvil run <projectDir> --credentials ... --full-refresh
sqlanvil run <projectDir> --credentials ... --actions <name> --include-deps
sqlanvil test <projectDir> --credentials ...Install with npm i -g @sqlanvil/cli. (Working from a sqlanvil repo checkout instead of the installed CLI? Use ./scripts/run <verb> in place of sqlanvil <verb>.)
Boot a local PG with ./tools/postgres/run-postgres-db.sh. --dry-run only validates BigQuery
today; on Postgres it does not EXPLAIN-validate SQL (known gap).
Named environments (--environment <name>): define dev/staging/prod in an environments:
block in workflow_settings.yaml; each bundles non-secret overrides + a pointer to a gitignored
credentials file:
environments:
dev: { schemaSuffix: dev, credentials: .df-credentials.dev.json }
prod: { defaultDatabase: prod_db, vars: { region: us-prod }, credentials: .df-credentials.prod.json }sqlanvil run . --environment prod loads prod's overrides + its credentials file (works on
compile/run/test). Precedence: explicit CLI flag > environment > workflow_settings defaults
(vars merge per-key). Each env's credentials: is a path to a gitignored .df-credentials*.json
file — secrets never go in workflow_settings.yaml. --schema-suffix stays the low-level primitive.
supabase: {} adds enableRls, publishToRealtime, ownerRole,
vectors: [{ column, dimensions, indexType }]. Action types: rlsPolicy,
realtimePublication, wrapper, vectorIndex. enableRls only flips RLS on — declare policies
via the rlsPolicy action.
Reference a pre-existing, externally-managed table so ${ref()} resolves and the DAG tracks it.
Two equivalent forms:
config { type: "declaration", schema: "raw", name: "orders" } // one per .sqlx file
declare({ schema: "raw", name: "orders" }); // many per .js file
declare({ schema: "raw", name: "customers" });Declarations are exempt from --schema-suffix / tablePrefix / datasetSuffix — by design
(session.ts passes declarations separately from the actions it renames). The suffix is not
applied to a declaration's own target, and ${ref()} to a declared source resolves to the real
(unsuffixed) table even under --schema-suffix dev — so a dev run reads true sources while writing
to suffixed output. A declaration without a suffix is correct; don't "fix" it.
One adapter serves both MySQL 8 and MariaDB 11 (same warehouse: mysql, same generated SQL;
MariaDB-specific features go through operations). Deliberately smaller than Postgres — several
deltas above invert.
- Config:
warehouse: mysql;defaultDataset= the MySQL database ("schema" is the database — no catalog level).sqlanvilCoreVersion: 1.5.0+ (MySQL landed in 1.5.0). - Credentials: flat
MysqlConnection—host port database user password sslMode. NodefaultSchema(unlike Postgres).sslMode:"disable"(local) or"require". Port 3306. Identifiers compile to two-part backticks`db`.`table`. mysql: {}block (indexes + engine/charset/collation). Declare secondary indexes (indexes: [{ name?, columns, unique? }]) and table options (engine,charset,collation) in config — the rolepostgres: {}plays. Plain B-tree only; noWHERE/INCLUDE/opclass(those are Postgres-only). Partitioning, FULLTEXT/SPATIAL/prefix indexes, androw_formatare NOT in the block yet — those stay raw MySQL DDL inoperations/post_operations(wrap one-time DDL on incrementals inwhen(!incremental()), delta #9). Usemysql: {}, neverpostgres: {}, on a mysql model.- Incremental
uniqueKeyis enough — compiles toINSERT ... ON DUPLICATE KEY UPDATEand the adapter auto-creates the unique index (uq_<db>_<table>) on first/--full-refresh. Don't add your own PK/unique for the merge. - Materialized views are emulated as a refreshed table snapshot —
type: "view", materialized: truebuilds a real table (drop + CTAS each run; refresh = re-run), honoring themysql:{}block. No native matview, so it's read back as a table. description:/columns:produce real table/column comments — applied viaALTER TABLE … COMMENT/MODIFY COLUMN … COMMENTand read back frominformation_schema. Tables/incrementals only — MySQL views can't carry comments. Assertions also work.- No cross-warehouse sources — a mysql project can't read
connections(FDW is Postgres-only) and MySQL can't be a source; nointrospectfor/from MySQL. ---not;(delta #6); neverDELIMITER(client-only directive — aCREATE PROCEDUREbody's internal;survive between---separators). Backtick-quote identifiers in raw DDL.- CLI:
sqlanvil init <dir> --warehouse mysql. Local engines:./tools/mysql/run-mysql-db.sh(mysql:8 on 3306, mariadb:11 on 3307).
| You'd reach for (Dataform/BQ) | Use instead (sqlanvil/PG) |
|---|---|
dataform.json |
workflow_settings.yaml, warehouse: postgres |
defaultProject / defaultLocation |
drop them; defaultDataset = schema |
bigquery: { partitionBy, clusterBy } |
postgres: { partition: {...}, indexes: [...] } |
OPTIONS(...) / table options |
postgres: { fillfactor, unlogged, tablespace } |
CREATE INDEX in post_operations |
postgres: { indexes: [...] } |
method: "btree" (string) |
method: 0 (numeric enum) |
; between statements |
--- |
CREATE PROCEDURE + run separately |
type: "operations" |
creds {postgres:{username,databaseName,ssl}} |
flat {host,port,database,user,password,sslMode,defaultSchema} |
| in-place matview refresh | postgres: { refreshPolicy: "on_dependency_change" } in the view config (else drop+recreate) |
dataform run / npm run |
sqlanvil run ... --credentials |
dataformCoreVersion: |
sqlanvilCoreVersion: (sqlanvil's own SemVer line) |
dataform.json · defaultProject · bigquery: { · partitionBy · clusterBy · OPTIONS( ·
method: " (string) · CREATE INDEX/SET (fillfactor in post_operations · ; between
statements · ADD PRIMARY KEY/ADD CONSTRAINT in an incremental post_operations without
when(!incremental()) · a bare dataform/npm run command.
On warehouse: mysql: a postgres: {}/bigquery: {} block · defaultSchema in creds ·
double-quoted identifiers in raw DDL (MySQL uses backticks) · materialized: true (errors) · a
hand-added PK/unique just to make an incremental uniqueKey work (auto-created) · DELIMITER
around a procedure body · expecting description:/columns: to produce DB comments (no-op today).
When unsure of a postgres: field name or enum value, read protos/configs.proto.