Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions _data/navigation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -587,6 +587,11 @@ items:

- url: /transformations/variables/
title: Variables & Shared Code
items:
- url: /transformations/variables/how-to/
title: How do I use variables and shared code?
- url: /transformations/variables/explanation/
title: What they are

- url: /transformations/dbt/
title: dbt Transformation
Expand All @@ -608,10 +613,19 @@ items:

- url: /transformations/python-plain/
title: Python Transformations
items:
- url: /transformations/python-plain/how-to/
title: How do I run a Python transformation?
- url: /transformations/python-plain/reference/
title: Reference

- url: /transformations/r-plain/
title: R Transformations
items:
- url: /transformations/r-plain/how-to/
title: How do I run an R transformation?
- url: /transformations/r-plain/reference/
title: Reference
- url: /transformations/r-plain/array-splitter/
title: Array Splitting

Expand All @@ -633,9 +647,23 @@ items:

- url: /transformations/bigquery/
title: BigQuery Transformations
items:
- url: /transformations/bigquery/how-to/
title: How do I run a BigQuery transformation?
- url: /transformations/bigquery/reference/
title: Reference

- url: /transformations/duckdb/
title: DuckDB Transformations
items:
- url: /transformations/duckdb/how-to/
title: How do I run a DuckDB transformation?
- url: /transformations/duckdb/reference/
title: Reference
- url: /transformations/duckdb/explanation/
title: When to use it
- url: /transformations/duckdb/snowflake-migration/
title: Snowflake to DuckDB Migration

- url: /transformations/oracle/
title: Oracle Transformations
Expand Down
81 changes: 81 additions & 0 deletions revamp/PRDCT-376-human-review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# PRDCT-376 — consolidated human-review queue

Every `TODO(human-review)` marker left inline across the Transformations section
split, grouped for sign-off. The markers remain in-source; this is the index.

> **Reconciled with PRDCT-354 (Devin "Audit vs code").** Page `type`s were
> aligned to Block 0 (added a `tutorial` type; retyped cli, r-plain
> array-splitter/binary/plots, duckdb/snowflake-migration → tutorial;
> troubleshooting → how-to; flows → explanation). Items below that Block B
> verified against component code are marked **Resolved**; only the genuinely
> unverifiable ones remain open.

## Resolved via PRDCT-354 audit (Block B — verified vs code)

- **Snowflake** (`snowflake-plain/reference.md`): `query_timeout=7200`,
`ABORT_TRANSFORMATION`, and copy/clone loading types — confirmed.
- **BigQuery** (`bigquery/reference.md`): `Query timeout` parameter default `0`
and `ABORT_TRANSFORMATION` (STRING DEFAULT '') — confirmed.
- **DuckDB** (`duckdb/reference.md`): `threads`, `max_memory_mb`, `dtypes_infer`,
`debug`, `syntax_check`, `duckdb_version`, **supported versions {1.5.2, 1.4.4}**,
the 4 sync actions, and block orchestration — confirmed.
- **Oracle** (`oracle/index.md`): the optional `schema` field is `db.schema`
(`scalarNode('schema')`) — confirmed.

## A. Reference facts still unverifiable (platform-level / not in code audit)

- **Snowflake** (`snowflake-plain/reference.md`): backend sizes + default;
8,192-char comment segfault; AWS-US timestamp parameter overrides; Free-Plan
backend availability.
- **BigQuery** (`bigquery/reference.md`): the "2 hours" GCP query-runtime claim
(platform-side; current GCP quota may be 6 h).
- **DuckDB** (`duckdb/reference.md`): backend sizes / memory figures; default Timeout (1 h).
- **Python** (`python-plain/reference.md`): current Python version; 8 GB memory /
6 h / CPU limits; preinstalled package list; backend sizes / default / plan.
- **R** (`r-plain/reference.md`): R `4.4.1` confirmed (PRDCT-354 Block A, bumped
4.0.5 → 4.4.1); 16 GB / 6 h / CPU limits; preinstalled packages; backend sizes.

## B. UI labels / control names to confirm (how-to pages)

In `snowflake-plain/how-to.md`, `bigquery/how-to.md`, `duckdb/how-to.md`,
`python-plain/how-to.md`, `r-plain/how-to.md`, `oracle/index.md`: the literal
navigation/label strings — e.g. **Components → Transformations**, **New
Transformation**, the per-backend type label (e.g. "Snowflake SQL Transformation"),
**New Table Input/Output**, **Create Table** — were written from convention and
need a quick check against the live UI. Also the DuckDB sample CSV's column names.

## C. dbt screenshots — alt text unverified

The images were not viewable while editing, so alt text on **kept** screenshots is
context-derived and prefixed `TODO(human-review: alt unverified)`. Verify each
against the actual image (12 kept):

- `dbt/transformation/transformation.md` (10): configuration overview, database
connection, project repository, load branches, execution steps, step edit,
freshness, output mapping, run panel, Discover timeline.
- `dbt/cloud/cloud.md` (2): dbt Cloud Trigger config, dbt Cloud API source connector config.

`dbt/cli/cli.md`: the `kbc dbt init` outputs (env vars, `profiles.yml`, generated
`models/_sources/*.yml`) are now transcribed to fenced blocks with masked
placeholders.

`code-patterns/index.md`: `TODO(human-review: transcribe generated-code screenshot)` —
the "Generated Code" screenshot shows code (should be a fenced block); content is
code-pattern-specific and wasn't viewable here, so add a representative example.

## D. Content correctness

- `transformations/index.md`: the first "Other features" table row was malformed
(no feature name, an extra cell, rowspan 9 vs 8 real features). It was removed and
the rowspan corrected to 8. **If it represented a real feature, re-add it with the
correct name/value.**

## E. Diátaxis mapping note (not a defect)

- **python-plain** and **r-plain** were Block-0-tagged how-to + reference + **tutorial**.
Because page `type` is constrained to how-to | reference | explanation, the
tutorial/dev-walkthrough facet was folded into each **how-to** ("Develop and
debug"). Confirm this is acceptable, or split a dedicated tutorial page later.
- **transformations/index.md** was kept as a single combined landing (not URL-split
into explanation+reference) to preserve load-bearing anchors such as
`#writing-scripts` referenced by every backend how-to.
2 changes: 1 addition & 1 deletion src/content.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ export const collections = {
// Docs revamp (Diátaxis) — every revamped page declares the single
// reader need it serves, plus user-vocabulary keywords for search/RAG.
keywords: z.array(z.string()).optional(),
type: z.enum(['how-to', 'reference', 'explanation']).optional(),
type: z.enum(['how-to', 'reference', 'explanation', 'tutorial']).optional(),
}),
}),
}),
Expand Down
83 changes: 83 additions & 0 deletions src/content/docs/transformations/bigquery/how-to.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: How do I run a BigQuery transformation?
slug: 'transformations/bigquery/how-to'
description: Create, configure, and run a Google BigQuery SQL transformation in Keboola from start to finish — set input mapping, write the SQL, set output mapping, run it, and confirm the result table landed in Storage.
keywords:
- run a BigQuery transformation
- create BigQuery transformation
- BigQuery SQL transformation Keboola
- BigQuery transformation example
type: how-to
---

You have a table in Keboola Storage and you want to transform it with BigQuery SQL and write the result back to Storage. This page takes you from nothing to a finished, successful run using a small worked example. For exact limits and syntax rules, see the [reference](/transformations/bigquery/reference/).

**Time:** ~10 minutes · **You will need:** a Keboola project (on a BigQuery backend) where you can create configurations, and one table in [Storage](/storage/tables/) to read from.

## Before you start

Get a table into Storage to use as the input. If you do not have one handy, upload the [sample CSV file](/transformations/source.csv) as a new table (Storage → your bucket → **Create Table**) — the example SQL below expects a `source` table with `first` and `second` columns. <!-- TODO(human-review): confirm the "Create Table" control label and bucket → table path. -->

## Step 1 — Create the transformation

1. Open **Components → Transformations**. <!-- TODO(human-review): confirm top-level nav label. -->
2. Click **New Transformation**.
3. Choose **Google BigQuery Transformation** as the type. <!-- TODO(human-review): confirm the exact type label in the picker. -->
4. Give it a descriptive name and confirm.

## Step 2 — Add the input mapping

1. In **Input Mapping**, click **New Table Input**. <!-- TODO(human-review): confirm control label. -->
2. Set **Source** to your Storage table.
3. Set the **Destination** (staging table name) to `source`.
4. Save the mapping.

## Step 3 — Write the SQL script

In the code editor, paste:

```sql
CREATE OR REPLACE TABLE `result` AS
SELECT `first`, CAST(`second` AS INT64) * 42 AS `larger_second`
FROM `source`;
```

This reads the staged `source` table and creates a `result` table with `first` and `second × 42`. Quote identifiers with backticks (`` `source` ``). You can split longer scripts into [blocks](/transformations/#writing-scripts).

## Step 4 — Add the output mapping

1. In **Output Mapping**, click **New Table Output**. <!-- TODO(human-review): confirm control label. -->
2. Set **Source** (the staging table the script created) to `result`.
3. Set **Destination** to a new Storage table, for example `out.c-main.result`.
4. Save the mapping.

## Step 5 — Run it and confirm the result

1. Click **Run** on the transformation.
2. Wait for the [job](/management/jobs/) to finish with a success status.
3. Open **Storage**, find your destination table (`out.c-main.result`), and check the data sample: it should contain `first` and `larger_second`, with `larger_second` equal to `second × 42`.

If the table is there with the expected values, the transformation works.

## Adjust the query timeout

By default a BigQuery query is capped at BigQuery's own maximum runtime. To raise or lower it for this configuration, set the **Query timeout** parameter — see [limits](/transformations/bigquery/reference/#limits).

## Stop a run on a condition

To abort deliberately (for example, when an integrity check fails) and return a user error, set the `ABORT_TRANSFORMATION` variable in your script. See [aborting execution](/transformations/bigquery/reference/#aborting-execution-abort_transformation).

## Troubleshooting

| Symptom | Likely cause | Fix |
|---|---|---|
| `Not found: Table source` (or similar) | Input mapping destination doesn't match the script | Make sure the input **Destination** is exactly `source` and the script references `` `source` ``. |
| Run succeeds but nothing appears in Storage | No output mapping, or wrong **Source** staging name | Add an output mapping whose **Source** matches the table your script created (`result`). |
| Query exceeds the time limit | Long-running query past the BigQuery maximum | Optimize the query, or raise the **Query timeout** parameter ([reference](/transformations/bigquery/reference/#limits)). |
| Transformation aborted with a user error | `ABORT_TRANSFORMATION` was set to a non-empty value | Expected if you use the abort pattern; otherwise check the logic that sets it. |

## Related

- [BigQuery transformation reference](/transformations/bigquery/reference/) — limits, data types, UDFs.
- [Input and output mapping](/transformations/mappings/) — how staging works.
- [Tutorial: Manipulating data](/tutorial/manipulate/) — guided first transformation.
119 changes: 11 additions & 108 deletions src/content/docs/transformations/bigquery/index.md
Original file line number Diff line number Diff line change
@@ -1,116 +1,19 @@
---
title: Google BigQuery Transformation
slug: 'transformations/bigquery'
description: Run SQL against Google BigQuery in Keboola. Start here, then jump to the how-to or the reference.
keywords:
- BigQuery transformation
- BigQuery transformations
- Google BigQuery SQL transformation
type: explanation
---

A **BigQuery transformation** runs your SQL against Google BigQuery — a fully managed, serverless, auto-scaling data warehouse — while Keboola handles input/output mapping to and from [Storage](/storage/tables/). It suits analytics over large datasets and integrates with the wider Google Cloud ecosystem.

This page is split by what you need:

[BigQuery](https://cloud.google.com/bigquery) offers a range of features:
- **[How do I run a BigQuery transformation?](/transformations/bigquery/how-to/)** — create, configure, and run one end to end, with a worked example and troubleshooting.
- **[BigQuery transformation reference](/transformations/bigquery/reference/)** — query limits, the abort variable, data-type casting, and user-defined functions.

- Fully managed, serverless data warehouse
- Automatic scaling of compute resources
- Storage and analysis of multi-terabyte datasets
- High-speed streaming insertion of data
- Integrates with Google's data analytics ecosystem

## Limits
- By default, individual queries have a [maximum run time](https://cloud.google.com/bigquery/quotas#query_jobs) of 2 hours, but you can adjust this using the *Query timeout* parameter.
- There is a [limit on the number of tables](https://cloud.google.com/bigquery/quotas#tables) referenced by a single query.
- While table updates are possible, BigQuery favors an append-only model where mutations are [generally discouraged](https://cloud.google.com/bigquery/docs/best-practices-costs#avoid_using_dml).

BigQuery is designed for flexibility and ease of use. Its integration with other Google Cloud services provides a robust platform for analytics at scale. To keep up with the latest improvements and updates, it's a good idea to monitor the [BigQuery release notes](https://cloud.google.com/bigquery/docs/release-notes).

For information on BigQuery limitations within Keboola, refer to the [BigQuery Limitations](/storage/byodb/#bigquery-limitations) section.

## Aborting Transformation Execution
In some cases, you may need to abort the transformation execution and exit with an error message.
To abort the execution, set the `ABORT_TRANSFORMATION` variable to any nonempty string value. The variable is already declared internally, so you only need to set its value.

```sql
SET ABORT_TRANSFORMATION = (
SELECT IF(COUNT(*) = 0, '', 'Integrity check failed')
FROM INTEGRITY_CHECK
WHERE RESULT = 'failed'
);
```

This example will set the `ABORT_TRANSFORMATION` variable value to `'Integrity check failed'` if the `INTEGRITY_CHECK` table
contains one or more records with the `RESULT` column equal to the value `'failed'`.

The transformation engine checks `ABORT_TRANSFORMATION` after each successfully executed query and returns the variable's value
as a user error, `Transformation aborted: Integrity check failed.` in this case.

![Screenshot - Transformation aborted](/transformations/bigquery/abort.png)

## Example
To create a simple BigQuery transformation, follow these steps:

- Create a table in Storage by uploading the [sample CSV file](/transformations/source.csv).
- Create an input mapping from that table, setting its destination to `source` (as expected by the BigQuery script).
- Create an output mapping, setting its destination to a new table in your Storage.
- Copy & paste the below script into the transformation code.
- Save and run the transformation.

```sql
CREATE OR REPLACE TABLE `result` AS
SELECT `first`, CAST(`second` AS INT64) * 42 AS `larger_second`
FROM `source`;
```

![Screenshot - Sample Transformation](/transformations/bigquery/sample-transformation.png)

You can organize the script into [blocks](/transformations/#writing-scripts).

## Best Practices

### Working With Data Types
Keboola Storage tables store data in character types. When creating a table for output mapping in BigQuery, you can rely on implicit casting to STRING:

```sql
CREATE OR REPLACE TABLE test (ID STRING, TM TIMESTAMP, NUM FLOAT64);

INSERT INTO test (ID, TM, NUM)
SELECT 'first', CURRENT_TIMESTAMP(), 12.5;
```

Alternatively, you can create the table with all columns as STRING and rely on implicit casting:

```sql
CREATE OR REPLACE TABLE test (ID STRING, TM STRING, NUM STRING);

INSERT INTO test (ID, TM, NUM)
SELECT 'first', FORMAT_TIMESTAMP('%F %T', CURRENT_TIMESTAMP()), CAST(12.5 AS STRING);
```

Explicit casting of columns to STRING is also an option:

```sql
CREATE OR REPLACE TABLE test (ID STRING, TM STRING, NUM STRING);

INSERT INTO test (ID, TM, NUM)
SELECT
CAST('first' AS STRING),
CAST(FORMAT_TIMESTAMP('%F %T', CURRENT_TIMESTAMP()) AS STRING),
CAST(12.5 AS STRING)
;
```

For unstructured data types in BigQuery, explicit casting is often necessary:

```sql
CREATE OR REPLACE TABLE test (ID STRING, TM STRING, NUM STRING, OBJ STRING);

INSERT INTO test (ID, TM, NUM, OBJ)
SELECT
'first',
FORMAT_TIMESTAMP('%F %T', CURRENT_TIMESTAMP()),
CAST(12.5 AS STRING),
TO_JSON_STRING(STRUCT('name' AS NAME, '123' AS CIN))
;
```

### UDF

There are two types of user-defined functions in BigQuery: persistent and temporary. Persistent UDFs are stored in a dataset and can be used by any user with access to the dataset. Temporary UDFs are only available during the session in which they are created.

Because BQ transformations always run in a new session (and new dataset), you can only use temporary UDFs. To create a temporary UDF, use the `CREATE TEMP FUNCTION` statement. You can find more information about UDFs in the [BigQuery documentation](https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions).
New to transformations in general? Start with [Transformations](/transformations/) and the [Getting Started tutorial](/tutorial/manipulate/). For BigQuery limitations specific to Keboola, see [BigQuery Limitations](/storage/byodb/#bigquery-limitations).
Loading