Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 24 additions & 7 deletions docs/examples/API_Demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The `examples/api_demo` scenario demonstrates how FastFlowTransform blends local data, external APIs, and multiple execution engines. It highlights:

- **Hybrid data model**: joins a local seed (`crm.users`) with live user data from JSONPlaceholder.
- **Multiple environments**: switch between DuckDB, Postgres, Databricks Spark, and BigQuery (pandas or BigFrames client) using `profiles.yml` + `.env.*`.
- **Multiple environments**: switch between DuckDB, Postgres, Databricks Spark, BigQuery (pandas or BigFrames client), and Snowflake (Snowpark) using `profiles.yml` + `.env.*`.
- **HTTP integration**: compare the built-in FastFlowTransform HTTP client (`api_users_http`) with a plain `requests` implementation (`api_users_requests`).
- **Offline caching & telemetry**: inspect HTTP snapshots via `run_results.json`.
- **Engine-aware registration**: scope Python models via `engine_model` and SQL models via `config(engines=[...])` so only the active engine’s nodes load.
Expand All @@ -21,7 +21,8 @@ The `examples/api_demo` scenario demonstrates how FastFlowTransform blends local
'engine:duckdb',
'engine:postgres',
'engine:databricks_spark',
'engine:bigquery'
'engine:bigquery',
'engine:snowflake_snowpark'
]
) }}
select id, email
Expand All @@ -32,11 +33,11 @@ The `examples/api_demo` scenario demonstrates how FastFlowTransform blends local
2. **API enrichment** – engine-specific Python implementations under `models/engines/<engine>/`:
- `api_users_http.ff.py` uses the built-in HTTP wrapper (`fastflowtransform.api.http.get_df`) with cache/offline support.
- `api_users_requests.ff.py` uses raw `requests` for maximum flexibility.
- Engine-specific callables are scoped with `engine_model(only=...)` (DuckDB/Postgres/Spark) or `env_match={"FF_ENGINE": "bigquery", "FF_ENGINE_VARIANT": ...}` (BigQuery pandas/BigFrames) to stay isolated per engine.
- Engine-specific callables are scoped with `engine_model(only=...)` (DuckDB/Postgres/Spark/Snowflake) or `env_match={"FF_ENGINE": "bigquery", "FF_ENGINE_VARIANT": ...}` (BigQuery pandas/BigFrames) to stay isolated per engine.

3. **Mart join** – `models/common/mart_users_join.ff.sql`
```sql
{{ config(engines=['duckdb','postgres','databricks_spark','bigquery']) }}
{{ config(engines=['duckdb','postgres','databricks_spark','bigquery','snowflake_snowpark']) }}
{% set api_users_model = var('api_users_model', 'api_users_http') %}
{% set api_users_refs = {
'api_users_http': ref('api_users_http'),
Expand Down Expand Up @@ -78,9 +79,21 @@ dev_bigquery_bigframes:
dataset: "{{ env('FF_BQ_DATASET', 'api_demo') }}"
location: "{{ env('FF_BQ_LOCATION', 'EU') }}"
use_bigframes: true

dev_snowflake:
engine: snowflake_snowpark
snowflake_snowpark:
account: "{{ env('FF_SF_ACCOUNT') }}"
user: "{{ env('FF_SF_USER') }}"
password: "{{ env('FF_SF_PASSWORD') }}"
warehouse: "{{ env('FF_SF_WAREHOUSE', 'COMPUTE_WH') }}"
database: "{{ env('FF_SF_DATABASE', 'API_DEMO') }}"
schema: "{{ env('FF_SF_SCHEMA', 'API_DEMO') }}"
role: "{{ env('FF_SF_ROLE', '') }}"
allow_create_schema: true
```

`.env.dev_*` files supply the actual values. `_load_dotenv_layered()` loads them in priority order: repo `.env` → project `.env` → `.env.<env>` → shell overrides (highest priority). Secrets stay out of version control.
`.env.dev_*` files supply the actual values (including `.env.dev_snowflake` for Snowflake credentials). `_load_dotenv_layered()` loads them in priority order: repo `.env` → project `.env` → `.env.<env>` → shell overrides (highest priority). Secrets stay out of version control.

### BigQuery specifics

Expand All @@ -91,7 +104,7 @@ dev_bigquery_bigframes:

## Makefile Workflow

`Makefile` chooses the profile via `ENGINE` (`duckdb`/`postgres`/`databricks_spark`/`bigquery`) and wraps the main commands. For BigQuery, set `BQ_FRAME=pandas|bigframes`:
`Makefile` chooses the profile via `ENGINE` (`duckdb`/`postgres`/`databricks_spark`/`bigquery`/`snowflake_snowpark`) and wraps the main commands. For BigQuery, set `BQ_FRAME=pandas|bigframes`:

```make
ENGINE ?= duckdb
Expand All @@ -108,6 +121,9 @@ ifeq ($(ENGINE),bigquery)
PROFILE_ENV = dev_bigquery_bigframes
endif
endif
ifeq ($(ENGINE),snowflake_snowpark)
PROFILE_ENV = dev_snowflake
endif

seed:
uv run fft seed "$(PROJECT)" --env $(PROFILE_ENV)
Expand All @@ -122,6 +138,7 @@ Common targets:
| `make ENGINE=duckdb seed`| Materialize seeds into DuckDB. |
| `make ENGINE=postgres run`| Execute the full pipeline against Postgres. |
| `make ENGINE=bigquery run BQ_FRAME=bigframes`| Run against BigQuery (default BigFrames client; set `BQ_FRAME=pandas` to switch). |
| `make ENGINE=snowflake_snowpark run`| Execute the API demo on Snowflake via Snowpark (install `fastflowtransform[snowflake]`). |
| `make dag` | Render documentation (`site/dag/`). |
| `make api-run` | Run only API models (uses HTTP cache). |
| `make api-offline` | Force offline mode (`FF_HTTP_OFFLINE=1`). |
Expand All @@ -131,7 +148,7 @@ HTTP tuning parameters (`FF_HTTP_ALLOWED_DOMAINS`, cache dir, timeouts) live in

## End-to-End Demo

1. **Select engine**: `make ENGINE=duckdb` (default). Set `ENGINE=postgres`, `ENGINE=databricks_spark`, or `ENGINE=bigquery BQ_FRAME=<pandas|bigframes>` to switch.
1. **Select engine**: `make ENGINE=duckdb` (default). Set `ENGINE=postgres`, `ENGINE=databricks_spark`, `ENGINE=bigquery BQ_FRAME=<pandas|bigframes>`, or `ENGINE=snowflake_snowpark` to switch.
2. **Seed data**: `make seed`
3. **Run pipeline**: `make run`
4. **Explore docs**: `make dag` → open `examples/api_demo/site/dag/index.html`
Expand Down
2 changes: 2 additions & 0 deletions docs/examples/Cache_Demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ make change_py # edit py_constants.ff.py -> rebuilds that model
make run_parallel # runs entire DAG with 4 workers per level
```

> Engines: set `ENGINE=<duckdb|postgres|databricks_spark|bigquery|snowflake_snowpark>` and copy the matching `.env.dev_*` file (`.env.dev_snowflake` for Snowflake; install `fastflowtransform[snowflake]`).

Seeds stay immutable: `change_seed` assembles a temporary combined copy in `.local/seeds` using
`patches/seed_users_patch.csv`, so the repo stays clean while fingerprints still change.

Expand Down
34 changes: 32 additions & 2 deletions docs/examples/DQ_Demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ examples/dq_demo/
.env.dev_databricks
.env.dev_bigquery_pandas
.env.dev_bigquery_bigframes
.env.dev_snowflake
Makefile # optional, convenience wrapper around fft commands
profiles.yml
project.yml
Expand Down Expand Up @@ -107,7 +108,9 @@ examples/dq_demo/
'scope:staging',
'engine:duckdb',
'engine:postgres',
'engine:databricks_spark'
'engine:databricks_spark',
'engine:bigquery',
'engine:snowflake_snowpark'
],
) }}

Expand Down Expand Up @@ -136,7 +139,9 @@ Aggregates orders per customer and prepares data for reconciliation + freshness:
'scope:mart',
'engine:duckdb',
'engine:postgres',
'engine:databricks_spark'
'engine:databricks_spark',
'engine:bigquery',
'engine:snowflake_snowpark'
],
) }}

Expand Down Expand Up @@ -513,6 +518,31 @@ To run the same demo on BigQuery:

Both profiles accept `allow_create_dataset` in `profiles.yml` if you want the example to create the dataset automatically.

## Snowflake Snowpark variant

To run on Snowflake:

1. Copy `.env.dev_snowflake` to `.env` and populate:
```bash
FF_SF_ACCOUNT=<account>
FF_SF_USER=<user>
FF_SF_PASSWORD=<password>
FF_SF_WAREHOUSE=COMPUTE_WH
FF_SF_DATABASE=DQ_DEMO
FF_SF_SCHEMA=DQ_DEMO
FF_SF_ROLE=<optional-role>
```
2. Install the Snowflake extra if needed:
```bash
pip install "fastflowtransform[snowflake]"
```
3. Run via the Makefile:
```bash
make demo ENGINE=snowflake_snowpark
```

The Snowflake profile enables `allow_create_schema`, so the schema is created automatically on first run when permitted.

## Things to experiment with

To understand the tests better, intentionally break the data and re-run `fft test`:
Expand Down
23 changes: 22 additions & 1 deletion docs/examples/Incremental_Demo.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Incremental, Delta & Iceberg Demo

This example project shows how to use **incremental models** and **Delta-/Iceberg-style merges** in FastFlowTransform across DuckDB, Postgres, Databricks Spark (Parquet, Delta & Iceberg), and BigQuery (pandas or BigFrames).
This example project shows how to use **incremental models** and **Delta-/Iceberg-style merges** in FastFlowTransform across DuckDB, Postgres, Databricks Spark (Parquet, Delta & Iceberg), BigQuery (pandas or BigFrames), and Snowflake Snowpark.


It is intentionally small and self-contained so you can copy/paste patterns into your own project.
Expand All @@ -26,6 +26,7 @@ incremental_demo/
.env.dev_databricks_iceberg
.env.dev_bigquery_pandas
.env.dev_bigquery_bigframes
.env.dev_snowflake
Makefile
profiles.yml
project.yml
Expand All @@ -51,6 +52,8 @@ incremental_demo/
fct_events_py_incremental.ff.py
bigframes/
fct_events_py_incremental.ff.py
snowflake_snowpark/
fct_events_py_incremental.ff.py
```

*Your actual filenames may differ slightly; the concepts are the same.*
Expand Down Expand Up @@ -79,6 +82,7 @@ The demo revolves around a tiny `events` dataset and three different ways to bui
* DuckDB / Postgres: incremental insert/merge in SQL
* Databricks Spark: `MERGE INTO` for Delta or Iceberg where available (Spark 4), with a fallback full-refresh strategy for other formats
* BigQuery: pandas- or BigFrames-backed DataFrame models with incremental merge logic handled by the BigQuery executor
* Snowflake Snowpark: Snowpark DataFrame operations with merges handled by the Snowflake executor

4. **Iceberg profile for Spark 4**

Expand Down Expand Up @@ -134,6 +138,8 @@ Conceptually:
'engine:duckdb',
'engine:postgres',
'engine:databricks_spark',
'engine:bigquery',
'engine:snowflake_snowpark'
],
) }}

Expand Down Expand Up @@ -273,6 +279,8 @@ Here the model body only defines the **canonical SELECT** and does *not* contain
'engine:duckdb',
'engine:postgres',
'engine:databricks_spark',
'engine:bigquery',
'engine:snowflake_snowpark',
],
) }}

Expand Down Expand Up @@ -581,6 +589,19 @@ FF_ENGINE=bigquery FF_ENGINE_VARIANT=bigframes FFT_ACTIVE_ENV=dev_bigquery_bigfr

Ensure the service account credentials pointed to by `GOOGLE_APPLICATION_CREDENTIALS` can create/drop tables in the target dataset.

### Snowflake Snowpark

```bash
# Seed / run / test (Snowflake profile)
FFT_ACTIVE_ENV=dev_snowflake FF_ENGINE=snowflake_snowpark fft seed .
FFT_ACTIVE_ENV=dev_snowflake FF_ENGINE=snowflake_snowpark fft run . \
--select tag:example:incremental_demo --select tag:engine:snowflake_snowpark --cache rw
FFT_ACTIVE_ENV=dev_snowflake FF_ENGINE=snowflake_snowpark fft test . \
--select tag:example:incremental_demo
```

Make sure `.env.dev_snowflake` sets the required `FF_SF_*` variables and install `fastflowtransform[snowflake]` so the Snowpark executor and client libraries are available.

### Databricks Spark

```bash
Expand Down
Loading