Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ dist/
# Node
node_modules/
packages/web-app/dist/
packages/web-app/test-results/
packages/web-app/playwright-report/
packages/web-app/playwright/.cache/

# Virtual environment
.venv/
Expand Down
191 changes: 150 additions & 41 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,52 +1,161 @@
# Contributing to DataLex

Thanks for contributing. This project includes a React web app, a Node API server, and a Python core/CLI.

## Development Setup
1. Clone the repository and enter the project root.
2. Install Node dependencies:
- `npm --prefix packages/api-server install`
- `npm --prefix packages/web-app install`
3. Create Python venv and install requirements:
- `python3 -m venv .venv`
- `source .venv/bin/activate`
- `pip install -r requirements.txt`

## Run Locally
- API: `npm --prefix packages/api-server run dev`
- Web: `npm --prefix packages/web-app run dev`
- CLI example: `./datalex validate model-examples/starter-commerce.model.yaml`

## Branch and PR Flow
Thanks for contributing. This project is a monorepo with three pieces:

- `packages/core_engine/` — Python loader, dialects, dbt integration, packages
- `packages/api-server/` — Node.js API the web UI talks to
- `packages/web-app/` — React/Vite studio (Zustand + React Flow)
- `packages/cli/` — `datalex` entry point

## Development setup

### Prerequisites

- **Python 3.9+** with `pip` and `venv`
- **Node 20+** (`nvm use 20` if you use nvm)
- **Git**

### One-time bootstrap

```bash
git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
python3 -m venv .venv && source .venv/bin/activate
pip install -e '.[serve,duckdb]' # core_engine + CLI + connector
npm --prefix packages/api-server install
npm --prefix packages/web-app install
```

`pip install -e '.[serve,duckdb]'` installs `datalex_core` + the `datalex`
CLI in editable mode and pulls the bundled Node runtime used by
`datalex serve`. Add more connector extras as needed:
`'.[serve,postgres,snowflake,bigquery,databricks]'`.

## Running locally

Two supported modes:

### Single-command (production-like)

```bash
datalex serve --project-dir .
```

Serves the API and the pre-built web bundle together on
`http://localhost:3030`. Uses the portable Node that `[serve]`
installed. Good for smoke-testing a change in a real browser.

### Hot-reload (for UI work)

Two terminals:

```bash
# Terminal 1 — API on :3006
npm --prefix packages/api-server run dev

# Terminal 2 — Vite dev server on :5173 with HMR
# (vite.config.js proxies /api → :3006 for you)
npm --prefix packages/web-app run dev
```

Open `http://localhost:5173`. The Vite proxy forwards every `/api/*`
call to the api-server, so the UI talks to the live backend while
HMR rebuilds React components on save.

CLI during development (for package-level hacks): `./datalex <cmd>`.

## Testing

### Python (core_engine + datalex)

```bash
python3 -m unittest -v tests/test_mvp.py tests/test_cli_dx.py tests/test_policy_engine_v2.py
./datalex validate-all --schema schemas/model.schema.json
```

### API server

```bash
npm --prefix packages/api-server test
```

### Web app — unit tests (fast, no browser)

```bash
npm --prefix packages/web-app test
```

Runs everything in `packages/web-app/tests/*.test.js` via Node's
built-in test runner.

### Web app — Playwright end-to-end (local-dev only)

```bash
# One-time: install browsers
npx --prefix packages/web-app playwright install chromium

# Clone + parse the jaffle-shop fixture once (needs dbt-duckdb)
cd packages/web-app/test-results/jaffle-shop # created by global-setup
pip install dbt-duckdb && dbt deps && dbt parse --profiles-dir .

# Run the suite (starts api + vite via Playwright webServer)
npm --prefix packages/web-app run test:e2e
```

The E2E suite clones `https://github.com/dbt-labs/jaffle-shop` into
`packages/web-app/test-results/jaffle-shop/` on first run and reuses
the checkout afterwards. It drives the real user journey: import →
diagram → (with `E2E_FULL=1`) rename cascade → autosave → auto-commit
→ dry-run apply.

**CI does not run this suite.** It requires dbt-core + a parsed
manifest on disk, which is too heavy and too flaky for every PR. The
backend contracts are already covered by api-server unit tests; the
Playwright suite is a local-dev regression tool for UI changes.
See [packages/web-app/e2e/README.md](packages/web-app/e2e/README.md).

## Branch and PR flow

1. Create a branch from `main`.
2. Keep changes focused and atomic.
3. Add or update tests for behavior changes.
4. Run relevant checks before opening PR.
5. Open PR with clear summary, impact, and test evidence.

## Recommended Checks
- Python unit tests:
- `python3 -m unittest -v tests/test_mvp.py tests/test_cli_dx.py tests/test_policy_engine_v2.py`
- Web tests:
- `npm --prefix packages/web-app test`
- Validate example models:
- `./datalex validate-all --schema schemas/model.schema.json`

## Commit Style
- Use short, imperative commit messages.
- Include scope when helpful, for example: `docs: update security policy`.

## Coding Expectations
- Keep changes backward compatible unless a breaking change is explicitly discussed.
3. Add or update tests for behavior changes (unit **and** E2E when the
change affects the UI loop).
4. Run the relevant test suites locally before opening the PR.
5. Open the PR with a clear summary, user-visible impact, and test
evidence.

CI runs on every PR:

- `api-server-tests.yml` — `packages/api-server/` unit tests
- `model-quality.yml` — core_engine unit tests + policy checks
- `datalex.yml` — `datalex validate` / `diff` / `dbt emit` on touched
DataLex projects

Playwright E2E is **not** in CI — run it locally before opening a PR
that changes import, canvas, or save-path behavior.

## Commit style

- Short, imperative commit messages.
- Include scope when helpful: `docs: update contributing guide`,
`web-app: drop bundled jaffle-shop fixture`.

## Coding expectations

- Keep changes backward compatible unless a breaking change is
explicitly discussed.
- Update docs/examples when behavior or CLI output changes.
- Avoid committing secrets, credentials, or local environment files.

## Reporting Bugs and Requesting Features
## Reporting bugs / requesting features

- Open a GitHub issue with reproduction steps and expected behavior.
- For connector issues, include connector type, redacted config, and failing command/log excerpt.
- For connector issues, include connector type, redacted config, and
the failing command/log excerpt.

## Cutting a release

See [RELEASING.md](RELEASING.md) for the full process. Short version:
bump `project.version` in `pyproject.toml`, move items from `[Unreleased]`
into a new dated section in `CHANGELOG.md`, merge, then push a signed
`vX.Y.Z` tag. CI publishes to PyPI automatically.
bump `project.version` in `pyproject.toml`, move items from
`[Unreleased]` into a new dated section in `CHANGELOG.md`, merge, then
push a signed `vX.Y.Z` tag. CI publishes to PyPI automatically.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ have in hand:

| You have... | Tutorial | Time |
|--------------------------------------------|--------------------------------------------------------------------|-------|
| Nothing — just want the demo | [Jaffle-shop one-click walkthrough](docs/tutorials/jaffle-shop-walkthrough.md) | 3 min |
| Nothing — want to try with a known-good dbt repo | [Walk through jaffle-shop end-to-end](docs/tutorials/jaffle-shop-walkthrough.md) | 5 min |
| An existing dbt project (folder or git) | [Import an existing dbt project](docs/tutorials/import-existing-dbt.md) | 5 min |
| A live warehouse (Snowflake/Postgres/…) | [Pull a warehouse schema](docs/tutorials/warehouse-pull.md) | 7 min |
| CLI-only, no UI | [CLI dbt-sync tutorial](docs/tutorial-dbt-sync.md) | 5 min |
Expand Down Expand Up @@ -242,7 +242,8 @@ dbt parse
- **[Getting started](docs/getting-started.md)** — the one-page map
covering install, the three GUI paths, and the mental model.
- **[Jaffle-shop walkthrough](docs/tutorials/jaffle-shop-walkthrough.md)** —
3-minute offline demo of every UI feature.
end-to-end demo: clone the real jaffle-shop repo, import it, rename an
entity, commit back to git.
- **[Import an existing dbt project](docs/tutorials/import-existing-dbt.md)** —
5-minute bring-your-own-repo flow (local folder or git URL).
- **[Pull a warehouse schema](docs/tutorials/warehouse-pull.md)** —
Expand Down
26 changes: 15 additions & 11 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,31 +28,35 @@ pip install 'datalex-cli[serve,all]' # every driver + Node

| You have... | Start here | Time |
|-----------------------------------------------|----------------------------------------------------------------|-------|
| Nothing — just want the demo | [Scenario 1 — jaffle-shop demo](#scenario-1--jaffle-shop-demo) | 3 min |
| Nothing — want to try with a canonical dbt repo | [Scenario 1 — clone jaffle-shop](#scenario-1--clone-jaffle-shop) | 5 min |
| An existing dbt project on disk | [Scenario 2 — your local dbt repo](#scenario-2--your-local-dbt-repo) | 5 min |
| A dbt repo on GitHub you want to try | [Scenario 3 — a git URL](#scenario-3--a-git-url) | 4 min |
| A live warehouse, no dbt yet | [Scenario 4 — warehouse pull](#scenario-4--live-warehouse-pull) | 7 min |
| CLI only, no UI | [CLI dbt-sync tutorial](tutorial-dbt-sync.md) | 5 min |

---

## Scenario 1 — Jaffle-shop demo
## Scenario 1 — Clone jaffle-shop

The fastest way to see if DataLex fits how you think. No dbt repo
needed, no warehouse, fully offline.
The fastest way to see if DataLex fits how you think. Uses the real
`dbt-labs/jaffle-shop` repo — no bundled demo, no surprises when you
switch to your own project later.

```bash
pip install 'datalex-cli[serve]'
git clone https://github.com/dbt-labs/jaffle-shop ~/src/jaffle-shop
datalex serve
```

Browser opens. Click **Import dbt repo → Load jaffle-shop demo**. The
Explorer fills with `models/staging/`, `models/marts/`, the canvas
shows an ER diagram with relationships, and the inspector renders
every column.

Nothing is written to disk. Close the tab and everything is gone.
When you want the real workflow, go to Scenario 2.
In the UI: **Import dbt repo → Git URL tab** → paste
`https://github.com/dbt-labs/jaffle-shop` → **Import**. The API
server clones on your behalf, runs the importer, and shows the
**Import Results** panel. Click **Open project**.

Prefer Save-All-writes-to-disk? Use the **Local folder** tab instead,
point it at the clone you just made (`~/src/jaffle-shop`), keep
**Edit in place** checked. Every UI edit then lands in the clone and
`git diff` shows normal dbt changes.

📖 **Full walkthrough:** [tutorials/jaffle-shop-walkthrough.md](tutorials/jaffle-shop-walkthrough.md)

Expand Down
Loading
Loading