Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"plugins": [
{
"name": "kbagent",
"version": "0.65.1",
"version": "0.66.0",
"source": "./plugins/kbagent",
"description": "AI-friendly interface to Keboola Connection projects — explore configs, jobs, lineage, call MCP tools, manage dev branches, and debug SQL in workspaces",
"category": "development"
Expand Down
3 changes: 2 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,8 @@ kbagent storage bucket-detail --project NAME --bucket-id ID [--branch ID]
kbagent storage tables [--project NAME ...] [--bucket-id ID] [--branch ID]
kbagent storage table-detail --project NAME --table-id ID [--branch ID]
kbagent storage create-bucket --project NAME --stage STAGE --name NAME [--description D] [--backend B] [--branch ID]
kbagent storage create-table --project NAME --bucket-id ID --name NAME --column COL:TYPE[(length)] [...] [--primary-key COL] [--not-null COL ...] [--default NAME=VALUE ...] [--branch ID] [--if-not-exists]
kbagent storage create-table --project NAME --bucket-id ID --name NAME [--column COL:TYPE[(length)] ...] [--primary-key COL] [--not-null COL ...] [--default NAME=VALUE ...] [--source-table-id ID] [--source-branch-id N] [--time-partitioning-type DAY|HOUR|MONTH|YEAR] [--time-partitioning-field COL] [--time-partitioning-expiration-ms MS] [--range-partitioning-field COL --range-partitioning-start S --range-partitioning-end E --range-partitioning-interval I] [--clustering-field COL ...] [--branch ID] [--if-not-exists]
# --column XOR --source-table-id (0.66.0+, BigQuery only): --source-table-id copies an existing table's data into the requested partition/clustering layout (schema derived from source) -> swap into place with swap-tables. Partition/clustering flags work in both modes (BigQuery only); time vs range partitioning are mutually exclusive. A non-BigQuery project fails fast (pre-flight backend check).
kbagent storage upload-table --project NAME --table-id ID --file PATH [--incremental] [--branch ID]
kbagent storage download-table --project NAME --table-id ID [--output FILE] [--columns COL ...] [--limit N] [--where-column COL --where-value VAL ... [--where-operator eq|neq]] [--changed-since WHEN] [--changed-until WHEN] [--branch ID]
kbagent storage add-column --project NAME --table-id ID --column COL:TYPE[(length)] [--not-null] [--default VALUE] [--branch ID]
Expand Down
2 changes: 1 addition & 1 deletion plugins/kbagent/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "kbagent",
"version": "0.65.1",
"version": "0.66.0",
"description": "AI-friendly interface to Keboola Connection projects — explore configs, jobs, lineage, call MCP tools, manage dev branches, and debug SQL in workspaces",
"author": {
"name": "Keboola",
Expand Down
1 change: 1 addition & 0 deletions plugins/kbagent/agents/keboola-expert.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ a critical failure.
| Create typed table with native types | `kbagent storage create-table --column pk:VARCHAR(40) --column amount:NUMBER(18,2) --not-null pk --default amount=0` (0.25.0+) | `tool call create_table` (accepts the same `definition.length` shape via MCP) | re-creating via raw REST to `/v2/storage/...tables-definition` |
| Add one column to an existing table | `kbagent storage add-column --project P --table-id in.c-foo.data --column status:VARCHAR(20) [--not-null] [--default active]` (0.62.0+) -- synchronous Storage endpoint, same `name:TYPE(length)` grammar as `create-table`; the add-side mirror of `delete-column` | -- | re-creating the whole table just to add a field (loses data/PK/dependents); raw `POST /v2/storage/tables/.../columns` |
| Promote typed rebuild back into the original name | `kbagent storage swap-tables --project P --table-id in.c-foo.data --target-table-id in.c-foo.data_change_log --branch <ID> --yes` (0.28.0+) -- async storage job (`tableSwap`); client polls to completion. Service refuses without a branch; any branch incl. prod | -- | renaming or deleting + re-uploading (loses history; downstream configs need to be rewritten) |
| Repartition / recluster a populated BigQuery table | `kbagent storage create-table --project P --bucket-id in.c-main --name events_repart --source-table-id in.c-main.events --time-partitioning-type DAY --time-partitioning-field created_at --clustering-field tenant_id --primary-key id` (0.66.0+, BigQuery only) to copy the data into the new layout, then `kbagent storage swap-tables --table-id in.c-main.events --target-table-id in.c-main.events_repart --branch <ID> --yes` to flip it into place. `--source-table-id` derives the schema from the source so `--column` is forbidden (mutually exclusive); a non-BigQuery project fails fast (pre-flight backend check, exit 2). Range partitioning instead: `--range-partitioning-field/-start/-end/-interval` (all four; bounds are strings; mutually exclusive with time partitioning) | -- | raw `POST /v2/storage/buckets/.../tables-definition` with a `source` object, then manual swap; or a `CREATE TABLE ... AS SELECT` in a workspace (drops NOT NULL + primary key) |
| Re-seed a table without losing its schema / PK / dependents | `kbagent storage truncate-table --project P --table-id in.c-foo.data [--branch ID] [--dry-run] [--yes]` (0.32.0+) -- DELETE `/tables/{id}/rows?allowTruncate=1`; endpoint is uniformly async on every branch (returns a queued `tableRowsDelete` job; client polls via `_wait_for_storage_job`). Do NOT pass `async=true` -- the API rejects it. Batch via repeated `--table-id`. Returns `{truncated[], failed[], dry_run, project_alias}` with `truncated[]` entries carrying `{table_id, rows_before, rows_after, branch_id}`. Permission class: `destructive` | `tool call delete_table_rows` if the upstream MCP exposes it | drop + recreate the table (loses descriptions, PK, sharing edges, and breaks every downstream config reference); deleting rows via raw SQL in a workspace (bypasses the Storage API audit trail) |
| Debug a failed job | `kbagent job detail --project P --job-id J --json` + `kbagent job run ... --log-tail-lines 200` | `kbagent workspace from-transformation` for SQL repro | "I think the issue is..." without reading logs |
| Ad-hoc SQL / row-count / type audit | `kbagent workspace create` + `kbagent workspace load` + `kbagent workspace query --sql "..."` (0.59.0+: results come back inline+fast but **capped at `--limit`, default 500** -- check `statements[].truncated`/`total_rows`, use `COUNT(*)` for counts, `--full` for the complete set) | `kbagent workspace from-transformation` for existing transform debugging; `workspace list --qs-compatible` (0.42.0+, #304) for data-app reuse | trusting a default `SELECT *` as the full result (it is truncated at 500); querying Storage via raw Snowflake credentials outside the workspace abstraction |
Expand Down
2 changes: 1 addition & 1 deletion plugins/kbagent/skills/kbagent/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ When working inside a git repository or project directory, run `kbagent init` (o
| List storage tables from one or more projects | `kbagent storage tables` |
| Show detailed table info including columns and types | `kbagent storage table-detail --project PROJECT --table-id TABLE-ID` |
| Create a new storage bucket | `kbagent storage create-bucket --project PROJECT --stage STAGE --name NAME` |
| Create a new storage table with typed columns | `kbagent storage create-table --project PROJECT --bucket-id BUCKET-ID --name NAME --column COLUMN` |
| Create a new storage table with typed columns | `kbagent storage create-table --project PROJECT --bucket-id BUCKET-ID --name NAME` |
| Upload a CSV file into a storage table | `kbagent storage upload-table --project PROJECT --table-id TABLE-ID --file FILE` |
| Export a storage table to a local CSV file | `kbagent storage download-table --project PROJECT --table-id TABLE-ID` |
| Delete one or more storage tables | `kbagent storage delete-table --project PROJECT --table-id TABLE-ID` |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ Requires a **super-admin** Manage API token (same kind as `org setup`). Same def
- `storage tables [--project NAME ...] [--bucket-id ID] [--branch ID]` -- list tables across all connected projects in parallel (multi-project by default, same as `storage buckets`); repeat `--project` to target a subset; `--bucket-id` is applied independently per project (missing buckets become per-project errors); `--branch` requires exactly one `--project`
- `storage table-detail --project NAME --table-id ID [--branch ID]` -- table detail with columns, types, primary key, row count (branch-aware)
- `storage create-bucket --project NAME --stage STAGE --name NAME [--description D] [--backend B] [--branch ID]` -- create bucket (branch-aware). With `--branch ID` on a project lacking the `storage-branches` feature (legacy fake-branch), response carries `legacy_branch_storage: true` and human mode prints a warning -- the runner will create a parallel `out.c-<branch_id>-*` bucket at job time. See `storage-types-workflow.md`
- `storage create-table --project NAME --bucket-id ID --name NAME --column col:TYPE[(length)] [...] [--primary-key COL] [--not-null COL ...] [--default NAME=VALUE ...] [--branch ID] [--if-not-exists]` -- create typed table. Base types `STRING/INTEGER/NUMERIC/FLOAT/BOOLEAN/DATE/TIMESTAMP` plus native backend types with length (`VARCHAR(40)`, `NUMBER(18,2)`, `TIMESTAMP_TZ`, `VARIANT`, etc.) -- type/length validation delegated to the Storage API. `--not-null` marks a column `nullable=false`; `--default NAME=VALUE` sets a DEFAULT expression (booleans must be lowercase `true`/`false`). In a dev branch, the target bucket is auto-materialized if it has not yet been written to there -- response surfaces this via `auto_created_bucket: bool`. On legacy fake-branch projects (no `storage-branches` feature), `legacy_branch_storage: true` flags that the runner will use a separate `out.c-<branch_id>-*` bucket at job time. `--if-not-exists` (0.47.0+) turns a duplicate-display-name failure into `action: skipped` when the table really exists at the expected id (safe for parallel workers). Since 0.47.1 the skipped envelope reports the EXISTING table's actual `columns`/`primary_key`/`name`, mirrors the request under `requested_columns`/`requested_primary_key`, and sets `schema_drift: true` when they diverge. See `storage-types-workflow.md`
- `storage create-table --project NAME --bucket-id ID --name NAME [--column col:TYPE[(length)] ...] [--primary-key COL] [--not-null COL ...] [--default NAME=VALUE ...] [--source-table-id ID] [--source-branch-id N] [--time-partitioning-type DAY|HOUR|MONTH|YEAR] [--time-partitioning-field COL] [--time-partitioning-expiration-ms MS] [--range-partitioning-field COL --range-partitioning-start S --range-partitioning-end E --range-partitioning-interval I] [--clustering-field COL ...] [--branch ID] [--if-not-exists]` -- create typed table. Base types `STRING/INTEGER/NUMERIC/FLOAT/BOOLEAN/DATE/TIMESTAMP` plus native backend types with length (`VARCHAR(40)`, `NUMBER(18,2)`, `TIMESTAMP_TZ`, `VARIANT`, etc.) -- type/length validation delegated to the Storage API. `--not-null` marks a column `nullable=false`; `--default NAME=VALUE` sets a DEFAULT expression (booleans must be lowercase `true`/`false`). In a dev branch, the target bucket is auto-materialized if it has not yet been written to there -- response surfaces this via `auto_created_bucket: bool`. On legacy fake-branch projects (no `storage-branches` feature), `legacy_branch_storage: true` flags that the runner will use a separate `out.c-<branch_id>-*` bucket at job time. `--if-not-exists` (0.47.0+) turns a duplicate-display-name failure into `action: skipped` when the table really exists at the expected id (safe for parallel workers). Since 0.47.1 the skipped envelope reports the EXISTING table's actual `columns`/`primary_key`/`name`, mirrors the request under `requested_columns`/`requested_primary_key`, and sets `schema_drift: true` when they diverge. **`--source-table-id` (0.66.0+, BigQuery only)** copies an existing table's data into the requested partition/clustering layout instead of building from `--column` (schema derived from source -> `--column`/`--not-null`/`--default` forbidden; the two are mutually exclusive). This is the supported way to repartition a populated BigQuery table -- then promote it with `storage swap-tables`. Partition/clustering flags (`--time-partitioning-*`, `--range-partitioning-*`, `--clustering-field`) also work on a plain `--column` create (BigQuery only); time vs range partitioning are mutually exclusive and range bounds are strings. When any source/partition/clustering flag is used, a one-call backend pre-flight rejects non-BigQuery projects (exit 2) before the create. See `storage-types-workflow.md`
- `storage upload-table --project NAME --table-id ID --file PATH [--incremental] [--branch ID]` -- upload CSV (branch-aware)
- `storage download-table --project NAME --table-id ID [--output FILE] [--columns COL ...] [--limit N] [--where-column COL --where-value VAL ... [--where-operator eq|neq]] [--changed-since WHEN] [--changed-until WHEN] [--branch ID]` -- export table to CSV (branch-aware). `--where-column` + `--where-value` (repeatable, OR within the set) + `--where-operator eq|neq` filter rows server-side; `--changed-since`/`--changed-until` (unix ts or strtotime like `-2 days`) filter by import time -- the credential-only, no-workspace way to pull a filtered/incremental slice (0.62.0+)
- `storage add-column --project NAME --table-id ID --column COL:TYPE[(length)] [--not-null] [--default VALUE] [--branch ID]` -- add a single column to an existing table (0.62.0+). Same `name:TYPE(length)` grammar as `create-table --column`; a bare `name` adds an untyped STRING column. Synchronous endpoint (no job to wait on). `--not-null` needs an empty table or a `--default`. Mirror of `delete-column`
Expand Down
27 changes: 27 additions & 0 deletions plugins/kbagent/skills/kbagent/references/gotchas.md
Original file line number Diff line number Diff line change
Expand Up @@ -1534,6 +1534,33 @@ One project failing does not block others. Check the `errors` array:
See [storage-types-workflow.md](storage-types-workflow.md) for the full
type inventory and examples.

## `storage create-table --source-table-id` + partition/clustering are BigQuery-only (since 0.66.0)

- **`--source-table-id` copies an existing table instead of building from `--column`.**
The new table's schema is derived from the source and its rows are copied into the
requested partition/clustering layout (`INSERT … SELECT`, preserving NOT NULL + primary
key). This is the supported way to repartition a populated BigQuery table; promote it
with `storage swap-tables`. Mirrors keboola/connection#7697.
- **`--column` and `--source-table-id` are mutually exclusive.** Supplying both, or
neither, exits 2 (`INVALID_ARGUMENT`) before any API call. `--not-null` / `--default`
attach to `--column` definitions, so they are also rejected in source mode.
- **Partition/clustering flags also work on a plain `--column` create** (BigQuery only):
`--time-partitioning-type` (DAY/HOUR/MONTH/YEAR; required when any `--time-partitioning-*`
is set) + optional `--time-partitioning-field`/`-expiration-ms`; OR
`--range-partitioning-field`/`-start`/`-end`/`-interval` (all four required together).
**Range bounds are strings** in the API, and time vs range partitioning are mutually
exclusive (BigQuery allows one partitioning kind per table).
- **BigQuery-only with a pre-flight guard.** When any source/partition/clustering flag is
used, `create-table` verifies the project backend (one token-verify call) and fails fast
with exit 2 + a clear `… require a BigQuery backend` message on a non-BigQuery project,
before issuing the create. A plain `--column` create makes no extra call. The connection
API also rejects these server-side (422 `storage.tables.backendDoesNotSupportSourceTable`,
`sourceAliasNotPersisted`, `sourceTableMissingReferencedColumn`; 404
`sourceTableNotFound`) as a backstop.
- **Aliases and linked-bucket tables are valid sources.** A persisted alias (materialized
view) is queryable; a non-persisted alias (project lacks `bigquery-persisted-alias-views`)
is rejected 422.

## Legacy fake-branch storage warning on `--branch` writes (since 0.25.2)

- **What it is.** Projects without the `storage-branches` feature flag use
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -302,3 +302,44 @@ Rules:
exchanged on return. Real swaps observed at ~10s on Snowflake.
- The swap is symmetric; there is no rollback besides swapping again
(or aborting the dev branch).

## BigQuery repartition via `create-table --source-table-id` (since 0.66.0)

On **BigQuery** you can produce the repartitioned/re-clustered copy in a
single call instead of hand-writing a CTAS transformation: `create-table`
copies an existing table's data into a new partition/clustering layout,
then `swap-tables` flips it into place.

```bash
# 1. Copy in.c-main.events into a new DAY-partitioned, tenant-clustered table.
# Schema (columns, NOT NULL) is derived from the source -> NO --column.
kbagent storage create-table --project prod --bucket-id in.c-main \
--name events_repart --source-table-id in.c-main.events \
--time-partitioning-type DAY --time-partitioning-field created_at \
--clustering-field tenant_id --primary-key id

# 2. Inspect the copy, then swap it into the original's place.
kbagent storage table-detail --project prod --table-id in.c-main.events_repart
kbagent storage swap-tables --project prod --table-id in.c-main.events \
--target-table-id in.c-main.events_repart --branch <DEFAULT_BRANCH_ID> --yes
```

Rules:
- **BigQuery only.** `--source-table-id` and the partition/clustering flags
(`--time-partitioning-*`, `--range-partitioning-*`, `--clustering-field`)
are rejected on a non-BigQuery project by a one-call backend pre-flight
(exit 2) before the create is issued. The connection API also rejects
them server-side (422) as a backstop.
- **`--column` XOR `--source-table-id`.** The two are mutually exclusive;
the schema in source mode is derived from the source, so `--column` /
`--not-null` / `--default` must not be supplied.
- **Partitioning shapes.** Time partitioning needs `--time-partitioning-type`
(DAY/HOUR/MONTH/YEAR); range partitioning needs all of
`--range-partitioning-field/-start/-end/-interval` (bounds are strings).
Time and range partitioning are mutually exclusive.
- **Sources.** Regular tables, linked-bucket tables, and persisted aliases
(materialized views) are valid copy sources. A non-persisted alias is
rejected 422.
- The partition/clustering flags also work on a plain `--column` create
(same BigQuery-only rule) when you want a fresh empty table in a specific
layout rather than a copy.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "keboola-cli"
version = "0.65.1"
version = "0.66.0"
description = "AI-friendly CLI for managing Keboola projects"
readme = "README.md"
requires-python = ">=3.12"
Expand Down
16 changes: 16 additions & 0 deletions src/keboola_agent_cli/changelog.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,22 @@

# Ordered newest-first. Each value is a list of brief one-line descriptions.
CHANGELOG: dict[str, list[str]] = {
"0.66.0": [
"New: `storage create-table` can copy from an existing table and apply a "
"BigQuery partition/clustering layout. `--source-table-id` (with optional "
"`--source-branch-id`) derives the new table's schema from a source table and "
"copies its rows into the requested layout -- the supported way to repartition "
"a populated BigQuery table, then flip it into place with `storage swap-tables`. "
"`--column` is now optional and mutually exclusive with `--source-table-id`. "
"New layout flags (also usable on a plain columns create): "
"`--time-partitioning-type`/`-field`/`-expiration-ms`, `--range-partitioning-field`/"
"`-start`/`-end`/`-interval`, and `--clustering-field` (repeatable). Time and range "
"partitioning are mutually exclusive. Mirrors keboola/connection#7697.",
"Note: the source-copy and partition/clustering flags are BigQuery-only. "
"`create-table` runs a one-call backend pre-flight (token verify) when any of them "
"is used and fails fast with a clear message on a non-BigQuery project, before "
"issuing the create. A plain columns create is unaffected (no extra call).",
],
"0.65.1": [
"BREAKING: Removed `data-app git-bind-credential` (and its `kbagent serve` endpoint). It shipped in "
"0.65.0 on a misdiagnosis: managed-repo deploys were failing and we believed the platform "
Expand Down
Loading
Loading