Skip to content

feat(dataapp-developer): AJDA-2840 add BigQuery Direct Storage Access examples#82

Open
sykora-ji wants to merge 4 commits into
mainfrom
sykorajiri-AJDA-2840
Open

feat(dataapp-developer): AJDA-2840 add BigQuery Direct Storage Access examples#82
sykora-ji wants to merge 4 commits into
mainfrom
sykorajiri-AJDA-2840

Conversation

@sykora-ji

@sykora-ji sykora-ji commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

link to issue

Description

Extends the dataapp-developer plugin's dataapp-development skill so it stops generating Snowflake-only SQL on BigQuery projects. Previously the skill claimed Direct Storage Access was "Snowflake only" and that the Query Service did not support BigQuery — both untrue. All changes are documentation/skill-content (no runtime code).

references/storage-access.md (main change):

  • New "BigQuery SQL dialect" section: backtick identifier quoting for the two-part dataset.table reference (both `dataset`.`table` and `dataset.table` are valid; the trap is adding a third leading segment — the Keboola in/out stage stays inside the mangled dataset name, not a separate segment), bucket→dataset mangling (./-_), the fact that only the dataset is mangled (table name keeps its form), the verified The project <stage> has not enabled BigQuery error, and Storage → Overview as the authoritative source for Dataset/Table names.
  • Unified the guidance on the Query Service as the preferred path on both backends; reframed backend routing so sql_dialect selects the SQL syntax to emit, not which API to call. The Storage API workspace-query endpoint is kept as a documented alternative for BigQuery (not legacy).
  • Corrected the return-shape section: verified the Query Service returns string cells on BigQuery too (only the Storage API endpoint returns native types); documented backend-specific column.type casing (Snowflake lowercase vs BigQuery uppercase).
  • Documented verified statement-level rules: INSERT/DML works on BigQuery via the Query Service (rows_affected populated, round-trip confirmed), statements in one execute_query call share a session, and each statement must be exactly one SQL command.
  • Replaced the "Snowflake only" claim in the Read-write Direct Storage Access section.

Consistency across the skill: streamlit-apps.md, dev-workflow.md, and troubleshooting.md aligned to the unified Query Service path with BigQuery quoting notes; the four code templates note the BigQuery quoting/dataset adjustment; TODO.md drops the resolved "Snowflake only" items and records the verified findings.

Versioning: bumped dataapp-developer to 1.3.0 in plugin.json and marketplace.json (new documented capability).

All BigQuery behaviour above was verified live against a real BigQuery project via keboola-query-service (quoting, mangling, read shape, INSERT round-trip). The only remaining untested path is a direct-grant write to a real Storage table from a deployed app (platform end-to-end, not skill behaviour) — recorded in TODO.md.

Release Notes

  • Justification
    • The skill generated Snowflake-specific SQL and claimed BigQuery was unsupported, so on BigQuery projects it produced queries that fail (wrong quoting, wrong dataset names). This corrects the guidance based on verified testing (parent AJDA-2835) and mirrors the public docs shipped in docs: AJDA-2839 add BigQuery section to Storage Access docs connection-docs#986.
    • In plain terms: the AI assistant that helps build Keboola data apps now writes correct database queries for BigQuery customers, not just Snowflake ones.
  • Plans for Customer Communication
    • No customer communication needed. This is internal AI-kit skill/documentation content; no platform feature or API changes.
  • Impact Analysis
    • No runtime impact — documentation/skill-content only. Nothing executes; the change only affects the guidance the assistant follows when generating data-app code. No feature flag. No single-tenant impact.
  • Deployment Plan
    • Merge to main; distributed via the AI-kit marketplace plugin version (dataapp-developer 1.3.0). No stack-by-stack rollout.
  • Rollback Plan
    • Fully reversible by reverting the commit / redeploying the previous plugin version. Not a one-way door.
  • Post-Release Support Plan
    • No monitoring or Support notification required.

… examples

Extend the dataapp-development skill so it stops emitting Snowflake-only SQL
on BigQuery projects.

- storage-access.md: new "BigQuery SQL dialect" section (backtick-per-segment
  quoting, bucket->dataset mangling, only the dataset is mangled, Storage
  Overview as the name source); unify on the Query Service as the preferred
  path on both backends with the Storage API workspace endpoint kept as an
  alternative; verified the Query Service returns string cells on BigQuery too
  (only the Storage API endpoint returns native types); document that INSERT/DML
  works on BigQuery via the Query Service, statements in one call share a
  session, and each statement must be exactly one SQL command.
- streamlit-apps.md, dev-workflow.md, troubleshooting.md: align wording and
  add BigQuery quoting notes.
- templates: note the BigQuery quoting/dataset adjustment.
- TODO.md: drop resolved "Snowflake only" items; record verified BQ findings.
- bump dataapp-developer to 1.3.0 (plugin.json + marketplace.json).
@linear

linear Bot commented Jun 25, 2026

Copy link
Copy Markdown

AJDA-2840

@sykora-ji sykora-ji requested a review from Copilot June 25, 2026 10:28
@sykora-ji sykora-ji requested a review from MiroCillik June 25, 2026 10:29
@sykora-ji sykora-ji marked this pull request as ready for review June 25, 2026 10:29

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the dataapp-development skill content in the dataapp-developer plugin to document BigQuery Direct Storage Access and align guidance so BigQuery projects don’t receive Snowflake-only SQL patterns.

Changes:

  • Expanded references/storage-access.md with BigQuery-specific guidance (dialect, dataset naming, Query Service usage/return-shape, and RW access notes).
  • Aligned other references and templates to the unified “prefer Query Service on both backends” guidance, adding BigQuery quoting/dataset-name notes.
  • Bumped dataapp-developer plugin version to 1.3.0 in both plugin and marketplace metadata.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
plugins/dataapp-developer/skills/dataapp-development/TODO.md Updates validation status notes for BigQuery behavior and removes outdated TODO items.
plugins/dataapp-developer/skills/dataapp-development/templates/streamlit/streamlit_app.py Adds a template comment warning about BigQuery quoting + dataset mangling.
plugins/dataapp-developer/skills/dataapp-development/templates/nodejs-app/api/queries.js Adds BigQuery quoting/dataset guidance near the FQN constant.
plugins/dataapp-developer/skills/dataapp-development/templates/duckdb-cache/python/cache.py Notes BigQuery quoting/dataset mangling for the “edit this SQL” section.
plugins/dataapp-developer/skills/dataapp-development/templates/duckdb-cache/nodejs/duck.js Notes BigQuery quoting/dataset mangling for the “edit this SQL” section.
plugins/dataapp-developer/skills/dataapp-development/references/troubleshooting.md Updates troubleshooting guidance to reflect Query Service preference on both backends.
plugins/dataapp-developer/skills/dataapp-development/references/streamlit-apps.md Aligns Streamlit storage-access guidance to Query Service on both backends + adds BigQuery note.
plugins/dataapp-developer/skills/dataapp-development/references/storage-access.md Main documentation updates: BigQuery dialect, Query Service guidance, return-shape notes, and alternative endpoint.
plugins/dataapp-developer/skills/dataapp-development/references/dev-workflow.md Adds a BigQuery dialect note to the dev-workflow query example context.
plugins/dataapp-developer/.claude-plugin/plugin.json Bumps dataapp-developer version to 1.3.0.
.claude-plugin/marketplace.json Bumps marketplace entry for dataapp-developer to 1.3.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread plugins/dataapp-developer/skills/dataapp-development/references/storage-access.md Outdated
Address PR review (Copilot): the BigQuery quoting rule was wrong. A
two-part `dataset.table` reference works whether you quote per segment
(`` `dataset`.`table` ``) or as a single pair (`` `dataset.table` ``) —
per-segment quoting is not required. The actual failure is adding a third
leading segment (the Keboola stage `in`/`out`, or splitting the dotted
bucket ID), which BigQuery resolves as a GCP project ("The project <stage>
has not enabled BigQuery").

Verified live against a real BigQuery project (in_c_shared_bucket.cashier-data):
`ds`.`tbl` and `ds.tbl` both succeed; `out`.`ds`.`tbl` and `out.ds.tbl` fail.

Rewrites the rule + examples in storage-access.md and drops the misleading
"per segment" wording from the other references and templates.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.

Comment thread plugins/dataapp-developer/skills/dataapp-development/references/storage-access.md Outdated
Comment thread plugins/dataapp-developer/skills/dataapp-development/references/storage-access.md Outdated
Comment thread plugins/dataapp-developer/skills/dataapp-development/references/storage-access.md Outdated
Comment thread plugins/dataapp-developer/.claude-plugin/plugin.json
Address PR review (Copilot):
- Replace ambiguous "no stage prefix" with "the in/out stage stays inside the
  mangled dataset name, not a separate segment" across storage-access.md,
  dev-workflow.md, and the four templates. The mangled name keeps in_/out_
  (e.g. `in_c_main`); the rule is no extra stage *segment*.
- Bump dataapp-developer README "## Version" to 1.3.0 to match plugin.json
  and marketplace.json.
@sykora-ji

Copy link
Copy Markdown
Contributor Author

Thanks — second review round addressed in 515c5b7 (plus a PR-description update). All nine comments were valid:

"no stage prefix" was ambiguous (7 comments). Correct — the mangled dataset name keeps the stage (in_c_main, out_c_analysis), so "no stage prefix" could be misread as dropping in_/out_. Reworded everywhere to: the in/out stage stays inside the single mangled dataset name, never a separate segment. Fixed in storage-access.md (the forward-pointer, the sql_dialect routing line, and the comparison table), the four templates, and also dev-workflow.md (same wording, not flagged but corrected for consistency).

README version (1 comment). Bumped dataapp-developer/README.md "## Version" to 1.3.0 to match plugin.json and marketplace.json.

PR description conflict (1 comment). Updated the PR description — removed the stale "backtick-per-segment / never around the whole FQN" phrasing so it matches the corrected section (both `dataset`.`table` and `dataset.table` are valid; the rule is no extra leading stage segment).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Comment thread plugins/dataapp-developer/skills/dataapp-development/references/storage-access.md Outdated
Comment thread plugins/dataapp-developer/skills/dataapp-development/references/storage-access.md Outdated
- Note the JS SDK method name `executeQuery` alongside Python's `execute_query`
  in the shared-session statement rule.
- Use the full 3-part Snowflake FQN in the dialect comparison table to match
  the doc's own "always use the fully-qualified name" rule (the BigQuery row
  stays 2-part, which is correct for BigQuery).
@sykora-ji

Copy link
Copy Markdown
Contributor Author

Third round addressed in 772bcba — both valid:

  • execute_query is Python-specific (storage-access.md). Added the JS name alongside it: "in one execute_query (Python) / executeQuery (JS) call".
  • Snowflake example omitted the database prefix (dialect table). Changed the Snowflake row to the full 3-part FQN "KBC_REGION_PROJID"."in.c-main"."customers" to match the doc's "always use the fully-qualified name" rule. The BigQuery row stays 2-part (`in_c_main`.`customers`), which is correct for BigQuery — no project/database segment.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants