Skip to content

Latest commit

 

History

History
299 lines (224 loc) · 9.3 KB

File metadata and controls

299 lines (224 loc) · 9.3 KB
DataLex by DuckCode AI Labs

DataLex

AI-first dbt adoption, contracts, diagrams, and manifest publishing.

DataLex is a local-first OSS workflow for teams that already use dbt. It scans your existing dbt project, lets AI propose business domains and contracts from dbt evidence, and writes reviewed DataLex artifacts back to Git.

DataLex does not replace dbt. dbt remains the source of truth for SQL, model YAML, semantic metrics, tests, exposures, and enforced physical contracts. DataLex adds the business/domain layer above dbt.

PyPI MIT License Discord Community

DuckCode Analytics Platform

DataLex is the contract layer of the DuckCode Analytics Platform — a three-layer governed analytics stack built on dbt, running on Snowflake, Databricks, and DuckDB.

Layer Tool Role
Domain contracts DataLex ← you are here AI proposes domain contracts from dbt evidence; humans certify; publishes datalex-manifest.json
Transformation dbt SQL models, tests, semantic metrics, physical contracts — source of truth
Analytics & AI DQL Certified blocks reference DataLex contracts; lineage, dashboards, governed AI answers on Snowflake, Databricks, and DuckDB

Full platform demo: jaffle-shop-duckdb — DataLex + dbt + DQL end-to-end walkthrough.

Architecture flow

DataLex turns dbt evidence into certified business contracts. AI accelerates the draft, but Git-reviewed contracts remain the trust boundary.

Click the diagram to open the full-size SVG in a browser tab, where you can zoom or use full-screen mode.

DataLex architecture flow from dbt evidence to AI proposals, human review, certified contracts, manifest, DQL blocks, and agents

Why users care: DataLex gives AI enough context to draft useful governance assets, but only reviewed and certified definitions enter the manifest that downstream tools can trust.

Install

Recommended: pipx (isolated, no PATH surprises)

pipx installs DataLex into its own isolated environment and puts a single datalex on your PATH — so it can't be shadowed by a stale copy in conda/system Python (a common cause of "command not found" or "version is wrong" confusion).

python3 -m pip install --user pipx && python3 -m pipx ensurepath
pipx install 'datalex-cli[serve]'
datalex --version
datalex serve

One-line installer (does the above for you):

curl -fsSL https://raw.githubusercontent.com/duckcode-ai/DataLex/main/scripts/install.sh | bash

Upgrade any time (DataLex also tells you when a new release is out):

datalex upgrade            # upgrades in place, however you installed it
datalex upgrade --check    # just check PyPI, don't install

Alternative: pip

Use this inside an existing virtualenv or dbt repo. If you hit the wrong version or a shadowed binary, run datalex doctor — it reports every datalex on your PATH and which one is actually running.

python3 -m pip install -U 'datalex-cli[serve]'
datalex --version
datalex serve

Open http://localhost:3030.

To open DataLex directly on an existing dbt repo:

cd ~/path/to/your-dbt-project
datalex serve --project-dir .

For warehouse drivers, add the matching extra:

python3 -m pip install -U 'datalex-cli[serve,duckdb]'
python3 -m pip install -U 'datalex-cli[serve,postgres]'
python3 -m pip install -U 'datalex-cli[serve,snowflake]'
python3 -m pip install -U 'datalex-cli[serve,all]'

With pipx, pass the same extras at install time, e.g. pipx install 'datalex-cli[serve,snowflake]'.

Requirements: Python 3.9+ and Git. The [serve] extra includes a portable Node runtime for the local UI.

Run with Docker

Use Docker when you do not want to install Python packages on the host.

git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
docker build -t datalex:local .
docker run --rm -p 3030:3001 datalex:local

To use Docker with an existing dbt repo:

cd ~/path/to/your-dbt-project
docker run --rm -p 3030:3001 \
  -v "$PWD":/workspace \
  -e REPO_ROOT=/workspace \
  -e DM_CLI=/app/datalex \
  datalex:local

In the UI, choose /workspace as the dbt project path.

Core workflow

Connect dbt repo -> AI Setup -> Readiness -> Generate -> Review -> Contracts -> Publish
  1. Connect your dbt repo.
  2. Set up AI with OpenAI, Claude, or Ollama.
  3. Scan readiness from dbt manifest, YAML, metrics, tests, exposures, owners, and contracts.
  4. Generate focused proposal packs for one domain, model group, or metric family.
  5. Review and certify proposals before anything becomes trusted.
  6. Publish datalex-manifest.json from certified contracts.

Generation requires a tested AI provider. Readiness works without AI, but DataLex will not create fake domains or placeholder contracts.

DataLex enterprise readiness in the paper theme

AI setup

DataLex uses your dbt evidence to generate proposals:

  • target/manifest.json
  • dbt model YAML
  • semantic models and metrics
  • tests and relationships
  • exposures
  • owners and descriptions
  • existing dbt contracts
  • existing DataLex artifacts

Provider settings are project-private and stored under:

<your-dbt-project>/.datalex/agent/provider-settings.json

They are not written under versioned DataLex/, and API responses redact secrets.

Ollama example

ollama pull gemma4:12b
ollama serve

In DataLex, open AI Setup, choose Ollama, set:

Base URL: http://localhost:11434
Model: gemma4:12b

Then click Save and Test.

What DataLex writes

New OSS artifacts use this domain-first layout:

DataLex/
  datalex.yaml
  domains/
    commerce.yaml
  commerce/
    conceptual/
    logical/
    physical/
    contracts/
    proposals/
    glossary/
    semantic/
  imported/
    dbt/
  generated/
    dbt/
  generated-sql/
  Skills/

DataLex still reads older layouts for compatibility, but new UI actions write lowercase canonical paths.

Only certified contracts and metric contracts enter datalex-manifest.json. Draft, reviewed, and rejected proposals stay out of the publish manifest.

Publish a manifest

datalex datalex manifest build DataLex --out DataLex/datalex-manifest.json

The manifest is the stable OSS handoff for downstream tools and future cloud flows. DQL is not required in the OSS repo. DataLex only shows DQL readiness when a project explicitly enables that integration.

Tutorials

Start here:

  1. Install and run DataLex
  2. Connect an existing dbt repo
  3. Configure AI with OpenAI, Claude, or Ollama
  4. Generate, review, and certify a proposal pack
  5. Publish the DataLex manifest
  6. Run DataLex with Docker

For the full flow in one place, read Getting started.

End-to-end example

This repo stays product-focused and does not ship a full sample project. To see DataLex and DQL together, use the separate duckcode-ai/jaffle-shop-duckdb repo.

That example contains a dbt + DuckDB project, a reviewed DataLex/ contract pack, a DQL workspace, Paper-theme screenshots, and the full Jaffle Shop tutorial.

For contributors

git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
python3 -m venv .venv
source .venv/bin/activate
pip install -e '.[serve,duckdb]'
npm --prefix packages/api-server install
npm --prefix packages/web-app install
datalex serve

Useful checks:

npm --prefix packages/api-server test
npm --prefix packages/web-app run build
python3 -m pytest tests/datalex packages/readiness_engine/tests

Links