From 663c08d3468b0f886c9b7e1050f64df3f7e6b527 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ji=C5=99=C3=AD=20Nov=C3=A1k?= Date: Wed, 24 Jun 2026 15:23:22 +0200 Subject: [PATCH 1/3] docs: AJDA-2839 add BigQuery section to Storage Access docs --- .../docs/data-apps/storage-access/index.md | 57 +++++++++++++++++-- 1 file changed, 53 insertions(+), 4 deletions(-) diff --git a/src/content/docs/data-apps/storage-access/index.md b/src/content/docs/data-apps/storage-access/index.md index 8320d7ac6..395c53656 100644 --- a/src/content/docs/data-apps/storage-access/index.md +++ b/src/content/docs/data-apps/storage-access/index.md @@ -13,8 +13,8 @@ Storage Access allows your Data App to read data from and write data back to Keb This feature is available for both **Streamlit** and **Python/JS** Data Apps. Code examples on this page use Python; the same concepts apply when calling the Query Service API from JavaScript. -:::caution -**Snowflake only.** Storage Access currently works only on projects using the Snowflake storage backend. BigQuery support is coming soon. +:::note +Storage Access works on both **Snowflake** and **BigQuery** backends. The SQL examples on this page use Snowflake identifier quoting (`"bucket"."table"`). On BigQuery, identifier quoting and table naming differ — see [Working with the BigQuery Backend](#working-with-the-bigquery-backend). ::: ## When to Use Storage Access @@ -48,7 +48,7 @@ Query Service ────► Workspace User ────► Storage Tables billing, metadata refresh with granted permissions ``` -Your app communicates with Storage through the [**Query Service API**](https://api.keboola.com/?service=query), not directly with Snowflake. This provides: +Your app communicates with Storage through the [**Query Service API**](https://api.keboola.com/?service=query), not directly with the underlying storage backend. This provides: - Automatic authentication using your app's token - Usage tracking for billing @@ -196,6 +196,7 @@ print(df.head()) - Use the full table ID in quotes: `"bucket_stage.bucket_name"."table_name"` - Example: `"in.c-sales"."orders"` for a table `orders` in bucket `in.c-sales` +- On **BigQuery**, identifier quoting and dataset names differ — see [Working with the BigQuery Backend](#working-with-the-bigquery-backend). ### Running Custom Queries @@ -317,6 +318,54 @@ if results[0].rows_affected == 0: raise Exception("Record was modified by another user. Please refresh and try again.") ``` +## Working with the BigQuery Backend + +Storage Access works on BigQuery projects as well as Snowflake ones. The setup steps, workspace lifecycle, environment variables, and Query Service client are **identical** across backends — but BigQuery uses a different SQL dialect, so the way you quote identifiers and name tables differs from the Snowflake examples above. + +If your project runs on BigQuery, apply the two rules below to every query. The Query Service passes SQL through to the backend unchanged — it does **not** translate dialects — so your application is responsible for emitting the correct syntax for the project's backend. + +### Quote identifiers with backticks, per segment + +BigQuery quotes identifiers with backticks (`` ` ``) instead of Snowflake's double quotes, and **each segment must be quoted separately**. Do not wrap the whole `dataset.table` reference in a single pair of backticks — BigQuery interprets a dotted name inside one pair of backticks as `project.dataset.table` and tries to resolve the first segment as a Google Cloud project (you'll see an error such as `The project has not enabled BigQuery`). + +```sql +-- ✅ Correct — each segment quoted on its own +SELECT * FROM `my_dataset`.`customers` LIMIT 1000 + +-- ❌ Wrong — BigQuery reads `my_dataset` as a project ID +SELECT * FROM `my_dataset.customers` LIMIT 1000 +``` + +| Backend | Identifier quoting | Example | +| --- | --- | --- | +| Snowflake | Double quotes | `"in.c-main"."customers"` | +| BigQuery | Backticks, per segment | `` `in_c_main`.`customers` `` | + +### Reference the dataset by its mangled bucket name + +BigQuery dataset names cannot contain dots (`.`) or hyphens (`-`), so a Keboola bucket is **not** exposed under its literal bucket ID. The bucket ID is mapped to a dataset name by replacing every `.` and `-` with an underscore (`_`): + +| Keboola bucket ID | BigQuery dataset name | +| --- | --- | +| `in.c-main` | `in_c_main` | +| `out.c-Test-Data---Customers-Products-Orders` | `out_c_Test_Data___Customers_Products_Orders` | + +So a table `customers` in bucket `out.c-Test-Data---Customers-Products-Orders` is referenced as: + +```sql +SELECT * FROM `out_c_Test_Data___Customers_Products_Orders`.`customers` LIMIT 100 +``` + +Only the **dataset** (bucket) name is mangled — the **table** name keeps its original form. A table named `cashier-data`, for example, stays `cashier-data`; it just needs backticks because of the hyphen. + +:::tip[Find the exact names in Storage] +You don't have to derive the names by hand. Open the table in **Storage** and go to its **Overview** tab — it shows the **Dataset Name** (the bucket's BigQuery dataset, e.g. `in_c_shared_bucket`) and the **Table Name** to use in your queries. + +If your app needs to discover names dynamically at runtime, you can also query `INFORMATION_SCHEMA.SCHEMATA` to list the datasets the workspace can see. +::: + +Everything else — reads, writes (`INSERT`/`UPDATE`/`DELETE`/`TRUNCATE`), pagination, and the best practices below — works the same as on Snowflake once the identifiers are quoted and named correctly. If you build a single app that must run on both backends, detect the backend type and generate identifiers accordingly. + ## Environment Variables When Storage Access is enabled, the platform sets these environment variables in your Data App container: @@ -595,6 +644,6 @@ logging.info(f"User {current_user} updated record {record_id} to status {new_sta ## Limitations -- **Snowflake only**: Storage Access currently works only with Snowflake backends. BigQuery support is planned for a future release. +- **SQL dialect differs by backend**: On BigQuery you must quote identifiers with backticks per segment and reference datasets by their mangled bucket name — see [Working with the BigQuery Backend](#working-with-the-bigquery-backend). The Query Service does not translate dialects. - **Column-level permissions not supported**: If you grant access to a table, the app can read/write all columns. - **Permission changes require app restart**: If you add or remove tables from the Storage Access configuration, the changes take effect on the next app start (deploy, redeploy, or wake from sleep). From 23ce976bf6a61f3dd80293ee163f6c531382ba40 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ji=C5=99=C3=AD=20Nov=C3=A1k?= Date: Wed, 24 Jun 2026 15:33:35 +0200 Subject: [PATCH 2/3] docs: AJDA-2839 rename "Data Apps" to "Apps" in Storage Access docs --- .../docs/data-apps/storage-access/index.md | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/src/content/docs/data-apps/storage-access/index.md b/src/content/docs/data-apps/storage-access/index.md index 395c53656..b9b79aa76 100644 --- a/src/content/docs/data-apps/storage-access/index.md +++ b/src/content/docs/data-apps/storage-access/index.md @@ -5,13 +5,13 @@ slug: 'data-apps/storage-access' -Storage Access allows your Data App to read data from and write data back to Keboola Storage tables in real-time. Your app connects directly to Keboola's storage through Query Service via SQL, enabling: +Storage Access allows your App to read data from and write data back to Keboola Storage tables in real-time. Your app connects directly to Keboola's storage through Query Service via SQL, enabling: - **Real-time data access**: Always work with the latest data, no redeployment needed - **Write-back capability**: Update, insert into **existing** Storage tables directly from your app - **Interactive applications**: Build data entry forms, approval workflows, and collaborative tools -This feature is available for both **Streamlit** and **Python/JS** Data Apps. Code examples on this page use Python; the same concepts apply when calling the Query Service API from JavaScript. +This feature is available for both **Streamlit** and **Python/JS** Apps. Code examples on this page use Python; the same concepts apply when calling the Query Service API from JavaScript. :::note Storage Access works on both **Snowflake** and **BigQuery** backends. The SQL examples on this page use Snowflake identifier quoting (`"bucket"."table"`). On BigQuery, identifier quoting and table naming differ — see [Working with the BigQuery Backend](#working-with-the-bigquery-backend). @@ -35,10 +35,10 @@ Storage Access works on both **Snowflake** and **BigQuery** backends. The SQL ex ### Architecture Overview -When you enable Storage Access, Keboola creates a dedicated **workspace** for your Data App. This workspace contains a database user with specific permissions (SELECT, INSERT, UPDATE, DELETE, TRUNCATE) on the tables you've selected. +When you enable Storage Access, Keboola creates a dedicated **workspace** for your App. This workspace contains a database user with specific permissions (SELECT, INSERT, UPDATE, DELETE, TRUNCATE) on the tables you've selected. ``` -Your Data App +Your App │ ▼ Query Service ────► Workspace User ────► Storage Tables @@ -84,7 +84,7 @@ This design ensures: ### Step 2: Configure Writable Tables -1. Open your Data App configuration in Keboola. +1. Open your App configuration in Keboola. 2. Go to the **Advanced Settings** tab. 3. Find the **Storage Access** section. 4. Click **+ Add Writable Table**. @@ -98,7 +98,7 @@ This design ensures: #### Configuring Writable Tables Programmatically -If you manage Data App configurations through the Storage API (or via automation/agents) rather than the UI, the same writable-table selection is expressed in the configuration JSON under `storage.output.tables`. Each entry is a table the app gets read/write permissions on: +If you manage App configurations through the Storage API (or via automation/agents) rather than the UI, the same writable-table selection is expressed in the configuration JSON under `storage.output.tables`. Each entry is a table the app gets read/write permissions on: ```json { @@ -118,7 +118,7 @@ If you manage Data App configurations through the Storage API (or via automation - **`destination`** — the full Storage table ID (`..`) the app should be able to read and write. The table must exist before the app is deployed. - **`unload_strategy: "direct-grant"`** — required marker that tells the platform "grant the app's workspace direct SELECT/INSERT/UPDATE/DELETE/TRUNCATE on this table." Tables without this strategy in `storage.output.tables` are not exposed via Storage Access. -To add or remove writable tables programmatically, update the Data App's configuration via the Storage API ([Component Configurations endpoint](https://api.keboola.com/?service=storage#tag--Component-Configurations)) and redeploy the app for the new permissions to take effect. +To add or remove writable tables programmatically, update the App's configuration via the Storage API ([Component Configurations endpoint](https://api.keboola.com/?service=storage#tag--Component-Configurations)) and redeploy the app for the new permissions to take effect. ### Step 3: Deploy Your App @@ -368,7 +368,7 @@ Everything else — reads, writes (`INSERT`/`UPDATE`/`DELETE`/`TRUNCATE`), pagin ## Environment Variables -When Storage Access is enabled, the platform sets these environment variables in your Data App container: +When Storage Access is enabled, the platform sets these environment variables in your App container: | Variable | Description | | --- | --- | @@ -395,7 +395,7 @@ except (KeyError, FileNotFoundError) as e: ) from e ``` -For the full list of environment variables exposed to Data Apps, see the [data-app-python-js runtime README](https://github.com/keboola/data-app-python-js/blob/main/README.md#environment-variables). +For the full list of environment variables exposed to Apps, see the [data-app-python-js runtime README](https://github.com/keboola/data-app-python-js/blob/main/README.md#environment-variables). ## Comparison: Input Mapping vs Direct Storage Access @@ -410,7 +410,7 @@ For the full list of environment variables exposed to Data Apps, see the [data-a **You can use both together:** Input Mapping for reference data that rarely changes, Storage Access for data you need to read/write in real-time. -## Example: Read-Write Data App +## Example: Read-Write App This example shows a simple Flask app that reads records from Storage and allows users to update their status. @@ -632,14 +632,14 @@ def get_cached_data(key, query_fn): **5. Track write operations** -Write operations are automatically tracked by the Query Service for billing purposes. For additional application-level auditing, log to stdout (visible in Data App container logs): +Write operations are automatically tracked by the Query Service for billing purposes. For additional application-level auditing, log to stdout (visible in App container logs): ```python import logging logging.basicConfig(level=logging.INFO) logging.info(f"User {current_user} updated record {record_id} to status {new_status}") -# Output goes to stdout → visible in the Terminal Log tab of your Data App +# Output goes to stdout → visible in the Terminal Log tab of your App ``` ## Limitations From 73ff9b25ba0ae6d8f56996f02fce6227837de142 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ji=C5=99=C3=AD=20Nov=C3=A1k?= Date: Thu, 25 Jun 2026 12:57:39 +0200 Subject: [PATCH 3/3] docs: AJDA-2839 correct BigQuery identifier-quoting guidance MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A two-part `dataset.table` reference works whether quoted per segment (`` `dataset`.`table` ``) or as a single pair (`` `dataset.table` ``) — per-segment quoting is not required. The real failure is adding a third leading segment (the Keboola stage `in`/`out`, or splitting the dotted bucket ID), which BigQuery resolves as a GCP project ("The project has not enabled BigQuery"). Verified live against a real BigQuery project (in_c_shared_bucket.cashier-data): `ds`.`tbl` and `ds.tbl` both succeed; `out`.`ds`.`tbl` and `out.ds.tbl` fail. Rewrites the rule, examples, comparison table, and Limitations note; renames the subsection heading (no internal links referenced the old anchor). --- .../docs/data-apps/storage-access/index.md | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/src/content/docs/data-apps/storage-access/index.md b/src/content/docs/data-apps/storage-access/index.md index b9b79aa76..9d775cfd7 100644 --- a/src/content/docs/data-apps/storage-access/index.md +++ b/src/content/docs/data-apps/storage-access/index.md @@ -324,22 +324,24 @@ Storage Access works on BigQuery projects as well as Snowflake ones. The setup s If your project runs on BigQuery, apply the two rules below to every query. The Query Service passes SQL through to the backend unchanged — it does **not** translate dialects — so your application is responsible for emitting the correct syntax for the project's backend. -### Quote identifiers with backticks, per segment +### Quote identifiers with backticks -BigQuery quotes identifiers with backticks (`` ` ``) instead of Snowflake's double quotes, and **each segment must be quoted separately**. Do not wrap the whole `dataset.table` reference in a single pair of backticks — BigQuery interprets a dotted name inside one pair of backticks as `project.dataset.table` and tries to resolve the first segment as a Google Cloud project (you'll see an error such as `The project has not enabled BigQuery`). +BigQuery quotes identifiers with backticks (`` ` ``) instead of Snowflake's double quotes. A table reference is **two parts** — `dataset.table` — and you may write it either as `` `dataset`.`table` `` or as `` `dataset.table` `` (both are valid; you do not have to quote each segment separately). The trap is **adding a third leading segment**: do not prepend the Keboola stage (`in`/`out`) or a Snowflake-style "database", and do not split the dotted bucket ID into separate backticked parts. BigQuery interprets a three-part name as `project.dataset.table` and tries to resolve the first segment as a Google Cloud project (you'll see an error such as `The project has not enabled BigQuery`). ```sql --- ✅ Correct — each segment quoted on its own -SELECT * FROM `my_dataset`.`customers` LIMIT 1000 +-- ✅ Correct — dataset.table (two parts); either quoting style works +SELECT * FROM `in_c_main`.`customers` LIMIT 1000 +SELECT * FROM `in_c_main.customers` LIMIT 1000 --- ❌ Wrong — BigQuery reads `my_dataset` as a project ID -SELECT * FROM `my_dataset.customers` LIMIT 1000 +-- ❌ Wrong — the Keboola stage `in` becomes a third (project) segment +SELECT * FROM `in`.`c-main`.`customers` LIMIT 1000 +SELECT * FROM `in.c-main.customers` LIMIT 1000 ``` | Backend | Identifier quoting | Example | | --- | --- | --- | | Snowflake | Double quotes | `"in.c-main"."customers"` | -| BigQuery | Backticks, per segment | `` `in_c_main`.`customers` `` | +| BigQuery | Backticks, `dataset.table` (no stage prefix) | `` `in_c_main`.`customers` `` | ### Reference the dataset by its mangled bucket name @@ -644,6 +646,6 @@ logging.info(f"User {current_user} updated record {record_id} to status {new_sta ## Limitations -- **SQL dialect differs by backend**: On BigQuery you must quote identifiers with backticks per segment and reference datasets by their mangled bucket name — see [Working with the BigQuery Backend](#working-with-the-bigquery-backend). The Query Service does not translate dialects. +- **SQL dialect differs by backend**: On BigQuery you must quote identifiers with backticks and reference datasets by their mangled bucket name (with no stage prefix) — see [Working with the BigQuery Backend](#working-with-the-bigquery-backend). The Query Service does not translate dialects. - **Column-level permissions not supported**: If you grant access to a table, the app can read/write all columns. - **Permission changes require app restart**: If you add or remove tables from the Storage Access configuration, the changes take effect on the next app start (deploy, redeploy, or wake from sleep).