feat(storage): create-table from a source table + BigQuery partition/clustering by yustme · Pull Request #468 · keboola/cli

yustme · 2026-06-29T10:54:53Z

What

Brings the Keboola connection capability from keboola/connection#7697 (DMD-1677) to the CLI: storage create-table can now create a table by copying a source table into a different BigQuery partition/clustering layout, and exposes the partition/clustering layout flags the tables-definition endpoint already supported but the CLI never surfaced.

This is the supported way to repartition a populated BigQuery table: copy it into the new layout, then flip it into place with the existing storage swap-tables.

Changes

--source-table-id (+ optional --source-branch-id): derive the new table's schema from a source table and copy its rows into the requested layout. Mutually exclusive with --column (and --not-null/--default, which attach to column specs).
--column is now optional (was required); exactly one of --column / --source-table-id must be given.
BigQuery layout flags (work in both columns and source mode): --time-partitioning-type/-field/-expiration-ms, --range-partitioning-field/-start/-end/-interval (range bounds are strings), --clustering-field (repeatable). Time vs range partitioning are mutually exclusive.
BigQuery-only, with a pre-flight guard: when any source/layout flag is used, the project backend is verified via one token-verify call and a non-BigQuery project fails fast (exit 2, clear message) before the create is issued. Plain --column creates make no extra call. The connection 422 codes (backendDoesNotSupportSourceTable, sourceAliasNotPersisted, sourceTableMissingReferencedColumn, sourceTableNotFound) remain a server-side backstop.

The client (KeboolaClient.create_table) builds the tables-definition body conditionally — source XOR columns, plus the optional timePartitioning/rangePartitioning/clustering objects — mirroring the exact shapes connection expects.

Example

# Copy a populated table into a DAY-partitioned, tenant-clustered layout...
kbagent storage create-table --project P --bucket-id in.c-main --name events_repart \
  --source-table-id in.c-main.events \
  --time-partitioning-type DAY --time-partitioning-field created_at \
  --clustering-field tenant_id --primary-key id

# ...then swap it into the original's place.
kbagent storage swap-tables --project P --table-id in.c-main.events \
  --target-table-id in.c-main.events_repart --branch <ID> --yes

Tests

New tests/test_storage_create_table.py: client body shaping (source vs columns, partition/clustering, string range bounds), service-layer XOR + partition validation, the BigQuery pre-flight guard (fires before POST; skipped for plain creates), and CLI flag pass-through / --column no longer required.
Backend-aware E2E step in tests/test_e2e.py: on BigQuery runs the source-copy + swap; on other backends asserts the pre-flight guard rejects with exit 2.
Updated existing test_storage_write.py call-signature assertions.
make check green (lint, format, typecheck, skill, version, command-sync, changelog, error-codes, 4198 tests).

Docs / version

Agent surfaces synced: context.py, CLAUDE.md, commands-reference.md, keboola-expert.md (tool matrix), gotchas.md, storage-types-workflow.md, regenerated SKILL.md.
Bumped to 0.66.0 with a changelog entry (make version-sync).

…clustering Extend `storage create-table` to mirror keboola/connection#7697: - `--source-table-id` (+ optional `--source-branch-id`): copy an existing table's data into the requested partition/clustering layout instead of building from `--column`. The schema is derived from the source, so `--column`/`--not-null`/`--default` are forbidden. This is the supported way to repartition a populated BigQuery table; pair with `swap-tables`. - `--column` is now optional and mutually exclusive with `--source-table-id`. - New BigQuery layout flags (also usable on a plain columns create): `--time-partitioning-type`/`-field`/`-expiration-ms`, `--range-partitioning-field`/`-start`/`-end`/`-interval` (bounds are strings), `--clustering-field`. Time vs range partitioning are mutually exclusive. - BigQuery-only with a one-call pre-flight guard: when any source/layout flag is used, the project backend is verified first and a non-BigQuery project fails fast (exit 2) before the create. Plain `--column` creates are unaffected. Connection 422 codes remain as a server-side backstop. Client builds the tables-definition body conditionally (source XOR columns, optional layout). Adds unit tests (client/service/CLI), a backend-aware E2E step, and full agent doc-sync. Bumps to 0.66.0.

devin-ai-integration · 2026-06-29T11:23:14Z

Code Review

Overall a clean, well-structured PR. The 3-layer architecture (command → service → client) is maintained, validations are thorough, test coverage spans all layers, and documentation is updated across all mandatory surfaces. Backward compatibility for plain --column creates is verified by test. CI green.

Findings (all addressed in `920b5bd`)

~~1. --source-branch-id without --source-table-id is silently ignored (Medium)~~ — Fixed: validation added + test (test_source_branch_id_without_source_table_rejected).

~~2. if_not_exists "skipped" path missing new keys (Low)~~ — Fixed: source_table_id, source_branch_id, time_partitioning, range_partitioning, clustering (all None) added to skipped dict + test assertions in test_storage_write.py.

~~3. Range partitioning human display is minimal (Nit)~~ — Fixed: now shows Range partitioning: field [start, end) step interval.

~~4. uv.lock revision 3→2 (Nit)~~ — Fixed: reverted to revision 3.

What's done well

Pre-flight backend guard — fails on non-BigQuery projects before the API call with a clear error message; plain columns creates pay no penalty (no extra API call).
XOR validation --column vs --source-table-id + prohibition of --not-null/--default in source mode — fail-fast with precise error messages.
_build_bigquery_layout and _build_source as pure helper functions separated from service logic.
Test coverage: client body shaping (3), service validation (7), backend guard (4), CLI pass-through (2), E2E (backend-aware path).
Doc sync: context.py, CLAUDE.md, commands-reference.md, keboola-expert.md, gotchas.md, storage-types-workflow.md, SKILL.md — all updated.

All findings addressed. LGTM 👍

- Reject --source-branch-id without --source-table-id (was silently dropped) - Keep skipped if-not-exists envelope schema-consistent with created path - Show range partitioning bounds in human output - Restore uv.lock revision 3 (unrelated downgrade)

yustme mentioned this pull request Jun 29, 2026

feat(web): bulk-remove multiple projects from the serve UI #469

Open

yustme mentioned this pull request Jun 29, 2026

feat(web): repartition BigQuery tables from the table detail UI #470

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(storage): create-table from a source table + BigQuery partition/clustering#468

feat(storage): create-table from a source table + BigQuery partition/clustering#468
yustme wants to merge 2 commits into
mainfrom
feat/create-table-from-source

yustme commented Jun 29, 2026

Uh oh!

devin-ai-integration Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

yustme commented Jun 29, 2026

What

Changes

Example

Tests

Docs / version

Uh oh!

devin-ai-integration Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Findings (all addressed in 920b5bd)

What's done well

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration Bot commented Jun 29, 2026 •

edited

Loading

Findings (all addressed in `920b5bd`)