diff --git a/_data/navigation.yml b/_data/navigation.yml index ab499ddb3..53767b949 100644 --- a/_data/navigation.yml +++ b/_data/navigation.yml @@ -623,6 +623,13 @@ items: - url: /transformations/snowflake-plain/ title: Snowflake Transformations + items: + - url: /transformations/snowflake-plain/how-to/ + title: How do I run a Snowflake transformation? + - url: /transformations/snowflake-plain/reference/ + title: Reference + - url: /transformations/snowflake-plain/explanation/ + title: When to use it - url: /transformations/bigquery/ title: BigQuery Transformations diff --git a/revamp/diataxis-page-template.md b/revamp/diataxis-page-template.md new file mode 100644 index 000000000..dbe36e17f --- /dev/null +++ b/revamp/diataxis-page-template.md @@ -0,0 +1,154 @@ +# Diátaxis page template + +Copy the block for the page type you are writing into a new Markdown file under +`src/content/docs/…`, then fill it in. One page serves **exactly one** reader +need. If you find yourself writing two of these on one page, split it. + +Frontmatter is identical across types except `type:`. Required keys: `title`, +`slug`, `description`, `keywords`, `type`. Add `redirect_from` only on a page +that takes over an old URL (usually the hub). + +Conventions to match (see existing pages): +- **Title + `description` in the user's words / symptom vocabulary**, not feature + labels. Cover singular/plural + obvious synonyms in `keywords`. +- **The text must carry all the meaning.** Remove every screenshot and the page + must still be fully doable. Keep a screenshot only if it genuinely helps a + human locate something in the UI, and give it real alt text. +- **All code/config in fenced blocks**, never as a screenshot. +- Root-relative links (`/transformations/…/`). +- Anything you can't verify against the component code / config schema → + `` inline. Never silently add/rename/remove fields. + +--- + +## How-to + +```markdown +--- +title: How do I ? +slug: '
//how-to' +description: +keywords: + - + - + - +type: how-to +--- + +<1–2 sentences: the situation the reader is in and what this page gets them to. +Link to the matching explanation and reference pages.> + +**Time:** ~N minutes · **You will need:** + +## Before you start + + + +## Step 1 — + +1. +2. + +## Step 2 — + +… + +## Step N — Run it and confirm it worked + +1. +2. + +## Troubleshooting + +| Symptom | Likely cause | Fix | +|---|---|---| +| | | | + +## Related + +- [](…) +- [](…) +``` + +--- + +## Reference + +```markdown +--- +title: reference +slug: '
//reference' +description: +keywords: + - + - +type: reference +--- + + + + + +## + +| | | Notes | +|---|---|---| +| … | … | … | + + +``` + +--- + +## Explanation + +```markdown +--- +title: When should I use ? (or: Understanding ) +slug: '
//explanation' +description: +keywords: + - + - when to use + - vs +type: explanation +--- + + + +## What it is + +## Why / how it fits + +## When to use it (and when not to) + + +``` + +--- + +## The hub (thin link page left at the OLD url) + +```markdown +--- +title: +slug: '' +description: +keywords: + - +type: explanation +redirect_from: + - +--- + +<1–2 sentences of what the thing is.> + +This page is split by what you need: + +- **[How do I …?](…/how-to/)** — … +- **[… reference](…/reference/)** — … +- **[When should I use …?](…/explanation/)** — … +``` diff --git a/revamp/diataxis-split-checklist.md b/revamp/diataxis-split-checklist.md new file mode 100644 index 000000000..4deeb8c48 --- /dev/null +++ b/revamp/diataxis-split-checklist.md @@ -0,0 +1,71 @@ +# Diátaxis split checklist + +Standard checklist for splitting one "frankenstein" page into how-to / reference +/ explanation + hub. Use it per page for Issue B (the remaining ~15). The +Snowflake transformation split is the validated reference example. + +## 0. Classify (Block 0) + +- [ ] Find the page's row in the Block 0 classification (Linear PRDCT-354): its + Diátaxis type(s), frankenstein flag, audience, and **machine + source-of-truth** (the component repo / config schema to verify against). +- [ ] Confirm it is actually a frankenstein (mixes ≥2 of how-to / reference / + explanation). If it is a clean single type, it does not need splitting. + +## 1. Inventory the source + +- [ ] Read the source page top to bottom; list every section. +- [ ] Tag each section: how-to (a task), reference (lookup), or explanation + (why/when). This mapping is the split plan — get it reviewed before writing. +- [ ] Note every screenshot and decide: does the text already carry it? Drop it + unless it helps locate something in the UI. +- [ ] Note every inbound link/anchor you'll need to preserve. + +## 2. Verify facts against code (not the UI, not the old text) + +- [ ] For each field / parameter / type / limit, check the component code or + config schema (the Block 0 source-of-truth). +- [ ] Anything you cannot verify → `` inline. +- [ ] Do **not** add, rename, or remove config fields — flag for a human instead. + +## 3. Write the three pages (use the template) + +- [ ] How-to, reference, explanation each created from the template. +- [ ] Frontmatter on every page: `title`, `slug`, `description`, `keywords`, + `type` (+ `redirect_from` where relevant). +- [ ] Titles + descriptions in the user's words / symptom vocabulary; keywords + cover singular/plural + synonyms. +- [ ] How-to has: literal control names + nav paths, copy-pasteable config, an + explicit "confirm it worked" step, and a troubleshooting section. +- [ ] Reference is lookup-only; explanation is conceptual-only. +- [ ] All code/config in fenced blocks. No code in screenshots. +- [ ] Clean up migration leftovers on the pages you create (e.g. `{: width}`, + stale anchors). + +## 4. Hub + redirect (don't break the old URL) + +- [ ] Replace the old page at its existing slug with a thin hub linking to the + three new pages. +- [ ] Keep existing `redirect_from` entries; the old URL must still resolve. +- [ ] Cross-link: each new page links back to the other two. + +## 5. Wire navigation + +- [ ] Add the three pages under the hub in `_data/navigation.yml`. +- [ ] `npm run gen:sidebar` (don't hand-edit `src/sidebar.mjs`). + +## 6. Verify the build + +- [ ] `npm run build` is clean. +- [ ] `node scripts/audit-phase2.mjs` shows no new issues on the pages you touched. +- [ ] Every cross-page anchor resolves (new heading IDs + the ones you link to on + existing pages). +- [ ] The old URL 301-redirects to the hub. + +## 7. Ship + +- [ ] Branch name and PR title carry the Linear id (e.g. `PRDCT-354: …`). +- [ ] Touch only the pilot page and its split outputs — nothing else. +- [ ] PR body lists: BEFORE → AFTER (what went where) and the human-review queue + (every `TODO(human-review)`). +- [ ] Share a preview link for review. diff --git a/src/content.config.ts b/src/content.config.ts index 25497070f..5a5b18a5f 100644 --- a/src/content.config.ts +++ b/src/content.config.ts @@ -13,6 +13,10 @@ export const collections = { icon: z.string().optional(), section: z.string().optional(), beacon: z.boolean().optional(), + // Docs revamp (Diátaxis) — every revamped page declares the single + // reader need it serves, plus user-vocabulary keywords for search/RAG. + keywords: z.array(z.string()).optional(), + type: z.enum(['how-to', 'reference', 'explanation']).optional(), }), }), }), diff --git a/src/content/docs/transformations/snowflake-plain/explanation.md b/src/content/docs/transformations/snowflake-plain/explanation.md new file mode 100644 index 000000000..d2737b1cd --- /dev/null +++ b/src/content/docs/transformations/snowflake-plain/explanation.md @@ -0,0 +1,59 @@ +--- +title: When should I use a Snowflake transformation? +slug: 'transformations/snowflake-plain/explanation' +description: Understand what a Snowflake SQL transformation is in Keboola, why and when to choose it over Python, R, BigQuery, or DuckDB, and how it fits into the input-mapping → script → output-mapping flow. +keywords: + - Snowflake transformation + - Snowflake transformations + - when to use Snowflake transformation + - SQL transformation Keboola + - Snowflake vs Python transformation + - Snowflake backend +type: explanation +--- + +A **Snowflake transformation** runs your [SQL](https://www.snowflake.com/) against a Snowflake database that Keboola manages for you. You write `SELECT` / `CREATE TABLE` statements; Keboola takes care of the warehouse, the staging area, and moving results back to [Storage](/storage/tables/). This page explains what that means and when it is the right choice. To build one, follow the [how-to](/transformations/snowflake-plain/how-to/); for exact limits and syntax rules, see the [reference](/transformations/snowflake-plain/reference/). + +## What it is + +Like every [transformation](/transformations/), a Snowflake transformation operates on an isolated copy of your data, not on Storage directly: + +1. **Input mapping** copies the Storage tables you name into a temporary staging schema. +2. Your **SQL script** runs against that staging schema. +3. **Output mapping** writes the resulting tables back to Storage. + +Because it works on a copy, you can rename or restructure Storage tables without breaking the script, and a failed run never corrupts your source data. + +## Why Snowflake + +Snowflake is a cloud data warehouse, which removes most of the operational burden of traditional databases: + +- **No database administration** — no servers, vacuuming, or patching to manage. +- **No indexes, sort keys, distribution styles, or column compression** to design and tune. +- **Easy scaling** — increase the [backend size](/transformations/snowflake-plain/reference/#backend-sizes-dynamic-backends) when a job needs more power, without rewriting anything. +- **Simple data types** and a familiar SQL dialect. +- **Strong processing power and throughput** for large joins and aggregations. + +Being a managed cloud service, Snowflake also ships continuous updates; occasionally that means behavioral changes worth tracking in the [release notes](https://docs.snowflake.com/en/release-notes/overview). + +## When to use it (and when not to) + +Choose a Snowflake transformation when: + +- Your logic is naturally expressed in **SQL** — joins, aggregations, filtering, denormalizing, integrity checks. +- Your data is **tabular** and you want set-based processing close to where the data already lives. +- You want to scale up heavy jobs simply by [changing the backend size](/transformations/snowflake-plain/how-to/#make-it-faster-backend-size). + +Consider a different backend when: + +- You need **procedural code**, custom libraries, or ML — use a [Python](/transformations/python-plain/) or [R](/transformations/r-plain/) transformation. +- Your project runs on a different warehouse — Keboola also offers [BigQuery](/transformations/bigquery/), [DuckDB](/transformations/duckdb/), and [Oracle](/transformations/oracle/) transformations. The concepts on this page are the same; the SQL dialect and limits differ. + +## Things to understand up front + +Two Snowflake behaviors trip people up; both are detailed in the [reference](/transformations/snowflake-plain/reference/): + +- **Case sensitivity.** Snowflake folds unquoted identifiers to upper case, but Keboola creates tables and columns in their original case. Quote your identifiers (`"my_column"`) so they match — see [identifier case sensitivity](/transformations/snowflake-plain/reference/#identifier-case-sensitivity). +- **Everything lands as character data.** Storage stores columns as character types, so values are cast to char on output — and `ARRAY`, `OBJECT`, and `VARIANT` must be cast explicitly. See [working with data types](/transformations/snowflake-plain/reference/#working-with-data-types). + +Understanding these two points early saves most of the debugging time newcomers spend on Snowflake transformations. diff --git a/src/content/docs/transformations/snowflake-plain/how-to.md b/src/content/docs/transformations/snowflake-plain/how-to.md new file mode 100644 index 000000000..4e84eda0d --- /dev/null +++ b/src/content/docs/transformations/snowflake-plain/how-to.md @@ -0,0 +1,93 @@ +--- +title: How do I run a Snowflake transformation? +slug: 'transformations/snowflake-plain/how-to' +description: Create, configure, and run a Snowflake SQL transformation in Keboola from start to finish — set input mapping, write the SQL script, set output mapping, run it, and confirm the result table landed in Storage. +keywords: + - run a Snowflake transformation + - create Snowflake transformation + - Snowflake SQL transformation Keboola + - how to write SQL transformation + - Snowflake transformation example +type: how-to +--- + +You have a table in Keboola Storage and you want to transform it with SQL and write the result back to Storage. This page takes you from nothing to a finished, successful run using a small worked example. For the concepts behind it, see [when to use a Snowflake transformation](/transformations/snowflake-plain/explanation/); for exact limits and syntax rules, see the [reference](/transformations/snowflake-plain/reference/). + +**Time:** ~10 minutes · **You will need:** a Keboola project where you can create configurations, and one table in [Storage](/storage/tables/) to read from. + +## Before you start + +Get a table into Storage to use as the input. If you do not have one handy, upload the [sample CSV file](/transformations/source.csv) as a new table (Storage → your bucket → **Create Table**) — the example SQL below expects a `source` table with `first` and `second` columns. + +## Step 1 — Create the transformation + +1. Open **Components → Transformations**. +2. Click **New Transformation**. +3. Choose **Snowflake SQL Transformation** as the type. +4. Give it a descriptive name (for example, `Double the second column`) and confirm. + +You now have an empty transformation configuration with sections for input mapping, the script, and output mapping. + +## Step 2 — Add the input mapping + +The input mapping copies a Storage table into the transformation's staging area under a name your script will use. + +1. In **Input Mapping**, click **New Table Input**. +2. Set **Source** to your Storage table (the sample table from *Before you start*). +3. Set the **Destination** (the staging table name) to `source`. +4. Save the mapping. + +## Step 3 — Write the SQL script + +In the transformation's code editor, paste: + +```sql +CREATE OR REPLACE TABLE "result" AS + SELECT "first", "second" * 42 AS "larger_second" FROM "source"; +``` + +This reads the staged `source` table and creates a `result` table with the `first` column and `second` multiplied by 42. + +Quote table and column names (`"source"`, `"first"`). Snowflake folds unquoted names to upper case, which won't match the identifiers Keboola created — see [identifier case sensitivity](/transformations/snowflake-plain/reference/#identifier-case-sensitivity). You can split longer scripts into [blocks](/transformations/#writing-scripts) to keep them organized. + +## Step 4 — Add the output mapping + +The output mapping writes a staging table back to permanent Storage. Without it, your `result` table is discarded when the job ends. + +1. In **Output Mapping**, click **New Table Output**. +2. Set **Source** (the staging table the script created) to `result`. +3. Set **Destination** to a new Storage table, for example `out.c-main.result`. +4. Save the mapping. + +## Step 5 — Run it and confirm the result + +1. Click **Run** on the transformation. +2. Wait for the [job](/management/jobs/) to finish with a green/success status. +3. Open **Storage**, find your destination table (`out.c-main.result`), and check the data sample: it should contain `first` and `larger_second`, with `larger_second` equal to `second × 42`. + +If the table is there with the expected values, the transformation works. + +## Make it faster (backend size) + +If the job is slow because of large data or complex queries, raise the **backend size** in the configuration (XSmall → Small → Medium → Large). A bigger backend allocates more resources; the available sizes and the default are listed in the [reference](/transformations/snowflake-plain/reference/#backend-sizes-dynamic-backends). Dynamic backends are not available on the Free Plan. + +## Stop a run on a condition + +To abort a transformation deliberately (for example, when an integrity check fails) and return a user error, set the `ABORT_TRANSFORMATION` variable in your script. See [aborting execution](/transformations/snowflake-plain/reference/#aborting-execution-abort_transformation). + +## Troubleshooting + +| Symptom | Likely cause | Fix | +|---|---|---| +| `table source not found` (or similar) | Input mapping destination name doesn't match the script | Make sure the input **Destination** is exactly `source` and the script references `"source"`. | +| `table footable not found` despite the table existing | Identifier case mismatch — unquoted names are folded to upper case | Quote identifiers (`"source"`, `"first"`); see [case sensitivity](/transformations/snowflake-plain/reference/#identifier-case-sensitivity). | +| Run succeeds but nothing appears in Storage | No output mapping, or wrong **Source** staging name | Add an output mapping whose **Source** matches the table your script created (`result`). | +| `Expression type does not match column data type ... got OBJECT` | An `ARRAY`/`OBJECT`/`VARIANT` value wasn't cast to char | Cast explicitly with `TO_CHAR(...)`; see [working with data types](/transformations/snowflake-plain/reference/#working-with-data-types). | +| Transformation aborted with a user error | `ABORT_TRANSFORMATION` was set to a non-empty value | Expected if you use the abort pattern; otherwise check the logic that sets the variable. | + +## Related + +- [Snowflake transformation reference](/transformations/snowflake-plain/reference/) — limits, data types, timestamps, backend sizes. +- [When should I use a Snowflake transformation?](/transformations/snowflake-plain/explanation/) — concepts and trade-offs. +- [Input and output mapping](/transformations/mappings/) — how staging works in detail. +- [Tutorial: Manipulating data](/tutorial/manipulate/) — guided first transformation. diff --git a/src/content/docs/transformations/snowflake-plain/index.md b/src/content/docs/transformations/snowflake-plain/index.md index 0d85bbf48..833c679d3 100644 --- a/src/content/docs/transformations/snowflake-plain/index.md +++ b/src/content/docs/transformations/snowflake-plain/index.md @@ -1,250 +1,22 @@ --- -title: Snowflake Transformation +title: Snowflake Transformations slug: 'transformations/snowflake-plain' +description: Run SQL against a managed Snowflake database in Keboola. Start here, then jump to the how-to, reference, or the explanation of when to use it. +keywords: + - Snowflake transformation + - Snowflake transformations + - Snowflake SQL transformation +type: explanation redirect_from: - /transformations/snowflake/ --- +A **Snowflake transformation** runs your SQL against a Snowflake database that Keboola manages — you write the queries, Keboola handles the warehouse, staging, and loading results back to [Storage](/storage/tables/). +This page is split by what you need: -[Snowflake](https://www.snowflake.com/) has many advantages: +- **[How do I run a Snowflake transformation?](/transformations/snowflake-plain/how-to/)** — create, configure, and run one end to end, with a worked example and troubleshooting. +- **[Snowflake transformation reference](/transformations/snowflake-plain/reference/)** — limits, backend sizes, identifier case sensitivity, data-type casting, timestamps, the abort variable, and read-only input mapping. +- **[When should I use a Snowflake transformation?](/transformations/snowflake-plain/explanation/)** — what it is, why Snowflake, and when to choose it over Python, R, BigQuery, or DuckDB. -- No database administration -- No indexes, sort keys, distribution styles, or column compressions -- Easy scaling -- Simple data types -- Amazing processing power and data throughput - -## Limits -- Snowflake queries are **limited** to 7,200 seconds by default. -- Queries containing comments longer than 8,192 characters will segfault. -- Constraints (like PRIMARY KEY or UNIQUE) are defined but [not enforced](https://docs.snowflake.com/en/sql-reference/constraints-overview). - -Snowflake is a cloud database and, as such, brings continuous updates and behavioral changes. If you are -interested in those changes, please follow the official [Snowflake release notes](https://docs.snowflake.com/en/release-notes/overview). - -When loading data to a Snowflake transformation, beware that there are two different -methods: [copy and clone](/transformations/mappings/#snowflake-loading-type). - -## Aborting Transformation Execution -In some cases, you may need to abort the transformation execution and exit with an error message. -To abort the execution, set the `ABORT_TRANSFORMATION` variable to any nonempty string value. - -```sql -SET ABORT_TRANSFORMATION = ( - SELECT - CASE - WHEN COUNT = 0 THEN '' - ELSE 'Integrity check failed' - END - FROM ( - SELECT COUNT(*) AS COUNT FROM INTEGRITY_CHECK WHERE RESULT = 'failed' - ) -); -``` - -This example will set the `ABORT_TRANSFORMATION` variable value to `'Integrity check failed'` if the `INTEGRITY_CHECK` table -contains one or more records with the `RESULT` column equal to the value `'failed'`. - -The transformation engine checks the `ABORT_TRANSFORMATION` after each successfully executed query and returns the value -of the variable as a user error, `Transformation aborted: Integrity check failed.` in this case. - -![Screenshot - Transformation aborted](/transformations/snowflake-plain/abort.png) - -## Dynamic Backends -If you have a large amount of data in databases and complex queries, your transformation might run for a couple of hours. -To speed it up, you can change the backend size in the configuration. Snowflake transformations suport the following sizes: -- XSmall -- Small _(default)_ -- Medium -- Large - -![Screenshot - Backend size configuration](/transformations/snowflake-plain/backend-size.png) - -Scaling up the backend size allocates more resources to speed up your transformation. - -***Note:** Dynamic backends are not available to you if you are on the [Free Plan (Pay As You Go)](/management/payg-project/).* - -## Example -To create a simple Snowflake transformation, follow these steps: - -- Create a table in Storage by uploading the [sample CSV file](/transformations/source.csv). -- Create an input mapping from that table, setting its destination to `source` (as expected by the Snowflake script). -- Create an output mapping, setting its destination to a new table in your Storage. -- Copy & paste the below script into the transformation code. -- Save and run the transformation. - -```sql -CREATE OR REPLACE TABLE "result" AS - SELECT "first", "second" * 42 AS "larger_second" FROM "source"; -``` - -![Screenshot - Sample Transformation](/transformations/snowflake-plain/sample-transformation.png) - -You can organize the script into [blocks](/transformations/#writing-scripts). - -## Best Practices - -### Case Sensitivity -Snowflake is [case sensitive](https://docs.snowflake.com/en/sql-reference/identifiers-syntax#label-identifier-casing). -All unquoted table/column names are converted to upper case while quoted names keep their case. - -So, if you want to create the following table, - -```sql --- creates table FOOTABLE -CREATE TABLE footable (...); -``` - -all of these commands will work - -```sql -SELECT * FROM FOOTABLE; -SELECT * FROM "FOOTABLE"; -SELECT * FROM footable; -``` - -while this one will not: - -```sql --- table footable not found! -SELECT * FROM "footable"; -``` - -Be especially careful when setting up [input and output mappings](/transformations/mappings/). - -When writing your transformation script, quoting all table and column names is highly recommended. -Snowflake converts all unquoted table/column identifiers to uppercase, which won't match table/column -identifiers created by Keboola (unless they happen to be all uppercase). - -```sql -SELECT "barcolumn" FROM "footable"; -``` - -### Working With Data Types -Storage [tables](/storage/tables/) store data in character types. When you create a table used on output mapping, -you can rely on implicit casting to char: - -```sql -CREATE OR REPLACE TABLE "test" (ID VARCHAR, TM TIMESTAMP, NUM NUMERIC); - -INSERT INTO "test" (ID, TM, NUM) -SELECT 'first', CURRENT_TIMESTAMP, 12.5; -``` - -Or, you can create the table directly with character columns (and rely on implicit casting to char): - -```sql -CREATE OR REPLACE TABLE "test" (ID VARCHAR, TM VARCHAR, NUM VARCHAR); - -INSERT INTO "test" (ID, TM, NUM) -SELECT 'first', CURRENT_TIMESTAMP, 12.5; -``` - -You can also explicitly cast the columns to char: - -```sql -CREATE OR REPLACE TABLE "test" (ID VARCHAR, TM VARCHAR, NUM VARCHAR); - -INSERT INTO "test" (ID, TM, NUM) -SELECT - TO_CHAR('first'), - TO_CHAR(CURRENT_TIMESTAMP), - TO_CHAR(12.5) -; -``` - -When using an [unstructured data type](https://docs.snowflake.com/en/sql-reference/data-types-semistructured), -you always **have to** use the explicit cast: - -```sql -CREATE OR REPLACE TABLE "test" (ID VARCHAR, TM VARCHAR, NUM VARCHAR, OBJ VARCHAR); - -INSERT INTO "test" (ID, TM, NUM, OBJ) -SELECT - 'first', - CURRENT_TIMESTAMP, - 12.5, - TO_CHAR( -- <- required! - OBJECT_CONSTRUCT( - 'NAME','name', - 'CIN','123' - ) - ) -; -``` - -The implicit cast does not work for the `ARRAY`, `OBJECT` and `VARIANT` types, so the following code: - -```sql -CREATE OR REPLACE TABLE "test" (ID VARCHAR, TM TIMESTAMP, NUM NUMERIC, OBJ OBJECT); - -INSERT INTO "test" (ID, TM, NUM, OBJ) -SELECT - 'first', - CURRENT_TIMESTAMP, - 12.5, - OBJECT_CONSTRUCT( - 'NAME','name', - 'CIN','123' - ) -; -``` - -will lead to an error: - -``` -Expression type does not match column data type, expecting VARCHAR(16777216) but got OBJECT for column OBJ, SQL state 22000 -``` - -### Timestamp Columns -By default, Snowflake uses the -`YYYY-MM-DD HH24:MI:SS.FF3` [format](https://docs.snowflake.com/en/sql-reference/functions-conversion#label-date-time-format-conversion) -when converting the `timestamp` column to a character string. - -This means that if you create a table in a transformation that uses a `timestamp` column, - -```sql -CREATE TABLE "ts_test" AS (SELECT CURRENT_TIMESTAMP AS "ts"); -``` - -the table value will come out as `2018-04-09 06:43:57.866 -0700` in Storage. If you -want to output it in a different format, you have to cast the column to a string first, for example: - -```sql -CREATE TABLE "out" AS - (SELECT TO_CHAR("ts", 'YYYY-MM-DD HH:MI:SS') AS "ts" FROM "ts_test"); -``` - -Do not use `ALTER SESSION` queries to modify the default timestamp format, as the loading and unloading sessions are separate -from your transformation/sandbox session and the format may change unexpectedly. - -**Important:** In the AWS US Keboola [region](https://developers.keboola.com/overview/api/#regions-and-endpoints) -(connection.keboola.com), the following [Snowflake default](https://docs.snowflake.com/en/sql-reference/parameters) -parameters are overridden: - -- [TIMESTAMP_OUTPUT_FORMAT](https://docs.snowflake.com/en/sql-reference/parameters) -- `DY, DD MON YYYY HH24:MI:SS TZHTZM` -- [TIMESTAMP_TYPE_MAPPING](https://docs.snowflake.com/en/sql-reference/parameters) -- `TIMESTAMP_LTZ` -- [TIMESTAMP_DAY_IS_ALWAYS_24H](https://docs.snowflake.com/en/sql-reference/parameters) -- `yes` - -**Important:** Snowflake works with time zones (and [Daylight Savings Time](https://en.wikipedia.org/wiki/Daylight_saving_time)), -requiring you to distinguish between various conversion functions: - -```sql -SELECT - -- yields 2013-03-10 02:12:00.000 +0000 - TO_TIMESTAMP_NTZ('10.3.2013 2:12', 'DD.MM.YYYY HH:MI'), - -- yields 2013-03-10 03:12:00.000 -0700 - TO_TIMESTAMP_TZ('10.3.2013 2:12', 'DD.MM.YYYY HH:MI'), - -- yields 2013-03-10 03:12:00.000 -0700 - TO_TIMESTAMP('10.3.2013 2:12', 'DD.MM.YYYY HH:MI'); -``` - -## Bucket Objects for Read-Only Input Mapping - -For more information on how a **read-only input mapping** works, visit the [link](/transformations/mappings/#read-only-input-mapping). -Buckets in Snowflake are represented by schemas. You can find all available schemas for your account by calling `SHOW SCHEMAS IN ACCOUNT;`. Each schema represents a bucket. -Alias tables are materialized as database VIEWs and are accessible via read-only input mappings — including filtered aliases and aliases from linked buckets. -For a linked bucket, the schema is available in another database. That is, to access this linked bucket you have to include the database name of the project from which the bucket is linked. -For example, say your bucket `in.c-customers` is linked from bucket `in.c-crm-extractor` in project 123. You then need to reference the tables in the transformation like this: `"KEBOOLA_123"."in.c-crm-extractor"."my-table"`. -When developing the transformation code, it's easiest to create a workspace with **read-only input mappings** enabled and look directly in the database to find the correct database and schema names. +New to transformations in general? Start with [Transformations](/transformations/) and the [Getting Started tutorial](/tutorial/manipulate/). diff --git a/src/content/docs/transformations/snowflake-plain/reference.md b/src/content/docs/transformations/snowflake-plain/reference.md new file mode 100644 index 000000000..266f9f871 --- /dev/null +++ b/src/content/docs/transformations/snowflake-plain/reference.md @@ -0,0 +1,226 @@ +--- +title: Snowflake transformation reference +slug: 'transformations/snowflake-plain/reference' +description: Lookup reference for Snowflake SQL transformations in Keboola — limits, backend sizes, identifier case sensitivity, data-type casting, timestamp handling, the abort variable, and read-only input mapping. +keywords: + - Snowflake transformation limits + - Snowflake transformation backend size + - Snowflake case sensitivity + - Snowflake data types Keboola + - ABORT_TRANSFORMATION + - Snowflake timestamp format + - read-only input mapping Snowflake +type: reference +--- + +Reference material for [Snowflake SQL transformations](/transformations/snowflake-plain/). To create one, see the [how-to](/transformations/snowflake-plain/how-to/); for when and why to use them, see the [explanation](/transformations/snowflake-plain/explanation/). + + + +## Limits + +| Limit | Value | Notes | +|---|---|---| +| Query runtime | 7,200 seconds (default) | Long-running queries are cancelled past this. | +| Comment length | 8,192 characters | Queries containing a comment longer than this will segfault. | +| Constraints | Defined but not enforced | `PRIMARY KEY` / `UNIQUE` are accepted but [not enforced by Snowflake](https://docs.snowflake.com/en/sql-reference/constraints-overview). | + +Snowflake is a cloud database that ships continuous updates and behavioral changes. Track them in the official [Snowflake release notes](https://docs.snowflake.com/en/release-notes/overview). + +## Loading type (copy vs. clone) + +When data is loaded into a Snowflake transformation there are two methods — **copy** and **clone**. They are configured on the input mapping; see [loading type](/transformations/mappings/#loading-type-snowflake-and-bigquery). + +## Backend sizes (dynamic backends) + +A larger backend allocates more resources to speed up a transformation that processes large volumes or complex queries. Set the size in the configuration (see [how to change it](/transformations/snowflake-plain/how-to/#make-it-faster-backend-size)). + +| Size | Notes | +|---|---| +| XSmall | | +| Small | Default | +| Medium | | +| Large | | + + + +Dynamic backends are **not** available on the [Free Plan (Pay As You Go)](/management/payg-project/). + +## Aborting execution (`ABORT_TRANSFORMATION`) + +To stop a transformation and exit with a user error, set the `ABORT_TRANSFORMATION` variable to any non-empty string. The engine checks it after each successfully executed query and returns the value as a user error (for example, `Transformation aborted: Integrity check failed.`). + +```sql +SET ABORT_TRANSFORMATION = ( + SELECT + CASE + WHEN COUNT = 0 THEN '' + ELSE 'Integrity check failed' + END + FROM ( + SELECT COUNT(*) AS COUNT FROM INTEGRITY_CHECK WHERE RESULT = 'failed' + ) +); +``` + +The example sets `ABORT_TRANSFORMATION` to `'Integrity check failed'` when the `INTEGRITY_CHECK` table has one or more rows with `RESULT = 'failed'`. An empty string does not abort. + +## Identifier case sensitivity + +Snowflake is [case sensitive](https://docs.snowflake.com/en/sql-reference/identifiers-syntax#label-identifier-casing). Unquoted table/column names are folded to **upper case**; quoted names keep their case. Keboola creates tables and columns with their original case, so unquoted identifiers in your script may not match. + +Given a table created unquoted: + +```sql +-- creates table FOOTABLE +CREATE TABLE footable (...); +``` + +all of these match it: + +```sql +SELECT * FROM FOOTABLE; +SELECT * FROM "FOOTABLE"; +SELECT * FROM footable; +``` + +while this does **not**: + +```sql +-- table footable not found! +SELECT * FROM "footable"; +``` + +Quoting every table and column name is strongly recommended so identifiers match what Keboola created: + +```sql +SELECT "barcolumn" FROM "footable"; +``` + +This matters most when setting up [input and output mappings](/transformations/mappings/). + +## Working with data types + +Storage [tables](/storage/tables/) store data as character types. When a table is used on output mapping you can rely on implicit casting to char: + +```sql +CREATE OR REPLACE TABLE "test" (ID VARCHAR, TM TIMESTAMP, NUM NUMERIC); + +INSERT INTO "test" (ID, TM, NUM) +SELECT 'first', CURRENT_TIMESTAMP, 12.5; +``` + +Or create the table with character columns directly: + +```sql +CREATE OR REPLACE TABLE "test" (ID VARCHAR, TM VARCHAR, NUM VARCHAR); + +INSERT INTO "test" (ID, TM, NUM) +SELECT 'first', CURRENT_TIMESTAMP, 12.5; +``` + +Or cast explicitly: + +```sql +CREATE OR REPLACE TABLE "test" (ID VARCHAR, TM VARCHAR, NUM VARCHAR); + +INSERT INTO "test" (ID, TM, NUM) +SELECT + TO_CHAR('first'), + TO_CHAR(CURRENT_TIMESTAMP), + TO_CHAR(12.5) +; +``` + +For [semi-structured types](https://docs.snowflake.com/en/sql-reference/data-types-semistructured) you **must** cast explicitly: + +```sql +CREATE OR REPLACE TABLE "test" (ID VARCHAR, TM VARCHAR, NUM VARCHAR, OBJ VARCHAR); + +INSERT INTO "test" (ID, TM, NUM, OBJ) +SELECT + 'first', + CURRENT_TIMESTAMP, + 12.5, + TO_CHAR( -- <- required! + OBJECT_CONSTRUCT( + 'NAME','name', + 'CIN','123' + ) + ) +; +``` + +Implicit casting does **not** work for `ARRAY`, `OBJECT`, and `VARIANT`. This code: + +```sql +CREATE OR REPLACE TABLE "test" (ID VARCHAR, TM TIMESTAMP, NUM NUMERIC, OBJ OBJECT); + +INSERT INTO "test" (ID, TM, NUM, OBJ) +SELECT + 'first', + CURRENT_TIMESTAMP, + 12.5, + OBJECT_CONSTRUCT( + 'NAME','name', + 'CIN','123' + ) +; +``` + +fails with: + +``` +Expression type does not match column data type, expecting VARCHAR(16777216) but got OBJECT for column OBJ, SQL state 22000 +``` + +## Timestamp columns + +By default Snowflake uses the `YYYY-MM-DD HH24:MI:SS.FF3` [format](https://docs.snowflake.com/en/sql-reference/functions-conversion#label-date-time-format-conversion) when converting a `timestamp` column to a character string. So: + +```sql +CREATE TABLE "ts_test" AS (SELECT CURRENT_TIMESTAMP AS "ts"); +``` + +lands in Storage as `2018-04-09 06:43:57.866 -0700`. To control the output, cast to a string first: + +```sql +CREATE TABLE "out" AS + (SELECT TO_CHAR("ts", 'YYYY-MM-DD HH:MI:SS') AS "ts" FROM "ts_test"); +``` + +Do **not** use `ALTER SESSION` to change the default timestamp format — the loading and unloading sessions are separate from your transformation/sandbox session and the format may change unexpectedly. + +In the AWS US Keboola [region](https://developers.keboola.com/overview/api/#regions-and-endpoints) (`connection.keboola.com`), these [Snowflake parameters](https://docs.snowflake.com/en/sql-reference/parameters) are overridden: + +| Parameter | Value | +|---|---| +| `TIMESTAMP_OUTPUT_FORMAT` | `DY, DD MON YYYY HH24:MI:SS TZHTZM` | +| `TIMESTAMP_TYPE_MAPPING` | `TIMESTAMP_LTZ` | +| `TIMESTAMP_DAY_IS_ALWAYS_24H` | `yes` | + +Snowflake also works with time zones (and [Daylight Saving Time](https://en.wikipedia.org/wiki/Daylight_saving_time)), so distinguish the conversion functions: + +```sql +SELECT + -- yields 2013-03-10 02:12:00.000 +0000 + TO_TIMESTAMP_NTZ('10.3.2013 2:12', 'DD.MM.YYYY HH:MI'), + -- yields 2013-03-10 03:12:00.000 -0700 + TO_TIMESTAMP_TZ('10.3.2013 2:12', 'DD.MM.YYYY HH:MI'), + -- yields 2013-03-10 03:12:00.000 -0700 + TO_TIMESTAMP('10.3.2013 2:12', 'DD.MM.YYYY HH:MI'); +``` + +## Read-only input mapping: buckets and schemas + +How a read-only input mapping works in general is described under [read-only input mapping](/transformations/mappings/#read-only-input-mapping). + +- Buckets are represented by **schemas**. List every schema available to your account with `SHOW SCHEMAS IN ACCOUNT;` — each schema is a bucket. +- Alias tables are materialized as database **views** and are reachable via read-only input mappings, including filtered aliases and aliases from linked buckets. +- For a **linked bucket**, the schema lives in another database, so you must include that project's database name. Example: bucket `in.c-customers` linked from `in.c-crm-extractor` in project `123` is referenced as `"KEBOOLA_123"."in.c-crm-extractor"."my-table"`. + +When developing, the easiest way to find the correct database and schema names is to create a [workspace](/transformations/#developing-transformations) with read-only input mappings enabled and inspect the database directly. diff --git a/src/sidebar.mjs b/src/sidebar.mjs index 2f7434be9..8543b6c15 100644 --- a/src/sidebar.mjs +++ b/src/sidebar.mjs @@ -441,7 +441,16 @@ export const sidebar = [ { slug: "transformations/r-plain/binary" }, ], }, - { slug: "transformations/snowflake-plain" }, + { + label: "Snowflake Transformations", + collapsed: true, + items: [ + { label: "Overview", slug: "transformations/snowflake-plain" }, + { slug: "transformations/snowflake-plain/how-to" }, + { slug: "transformations/snowflake-plain/reference" }, + { slug: "transformations/snowflake-plain/explanation" }, + ], + }, { slug: "transformations/bigquery" }, { slug: "transformations/duckdb" }, { slug: "transformations/oracle" },