Skip to content

feat(web): repartition BigQuery tables from the table detail UI#470

Open
yustme wants to merge 3 commits into
feat/create-table-from-sourcefrom
feat/web-repartition-bq-table
Open

feat(web): repartition BigQuery tables from the table detail UI#470
yustme wants to merge 3 commits into
feat/create-table-from-sourcefrom
feat/web-repartition-bq-table

Conversation

@yustme

@yustme yustme commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What

Adds a Repartition tab to the kbagent serve --ui table detail for BigQuery tables — the UI counterpart to the CLI repartition flow shipped in #468. The user picks a partition/clustering layout and the UI runs the supported repartition path end to end.

Stacked on #468. This branch is based on feat/create-table-from-source because it depends on that PR's StorageService.create_table source-copy/partitioning params. Merge #468 first, then this (or rebase onto main once #468 lands).

How it works

The Repartition tab only appears for BigQuery, non-alias tables (gated on the bucket backend). On submit it runs the same two-step flow as the CLI:

  1. create-table --source-table-id — copies the table into a sibling <name>_repartition with the chosen layout (rows copied).
  2. swap-tables — atomically flips the new-layout copy into the original's place.

After the swap the old data/layout lives under the sibling id; the UI then asks whether to delete it (per the chosen behavior — keep as a backup or delete).

Branch behavior: runs in the branch selected in the top bar. With no dev branch selected it resolves the project's default (production) branch and runs there behind an explicit confirm — a dev-branch swap never merges back, so production is the only branch that actually repartitions the live table.

Changes

  • Frontend (web/frontend/src/pages/Storage.tsx): new RepartitionTab (time vs integer-range partitioning + clustering field picker), BigQuery gating, branch resolution + production confirm, create→swap orchestration with progress, post-swap delete prompt.
  • Server (server/routers/storage.py): POST /storage/tables/{project} now forwards source_table_id/source_branch_id, time_partitioning_*, range_partitioning_*, clustering_fields; columns is now optional (source mode). Mirrors the CLI.
  • Service (services/storage_service.py): table-detail response now includes the owning bucket's backend so the UI can gate BigQuery-only features.
  • Bumped to 0.67.0 + changelog.

Tests

  • Server router: new-param passthrough to create_table + columns optional (no 422 on source-only body).
  • Service: backend surfaced from the bucket (+ empty-string fallback).
  • make changelog-check green; ruff/format/ty clean; frontend tsc -b && vite build clean; existing storage/server suites pass.

Not covered here

Full BigQuery E2E of the actual repartition action needs a live BigQuery project; verified at the unit/integration + build level. The bundled _ui_dist is generated at wheel-build time (gitignored), so only frontend source is committed.

Add a Repartition tab to the serve --ui table detail for BigQuery tables.
The user picks a time or integer-range partitioning layout plus optional
clustering fields; the UI runs the supported repartition flow:
create-table --source-table-id (copy rows into new layout) then swap-tables
(atomic flip into place), and finally offers to delete the leftover old
table. Runs in the branch selected in the top bar; with no dev branch it
repartitions production behind an explicit confirm.

Server: POST /storage/tables/{project} now forwards the source-copy +
BigQuery partition/clustering fields and makes columns optional, matching
the CLI. Table detail responses expose the owning bucket's backend so the
UI can gate the BigQuery-only tab.

Bump to 0.67.0 with changelog.
@devin-ai-integration

Copy link
Copy Markdown

Code Review

Celkově solidní PR — architektura (create → swap → optional delete) je čistá, testy pokrývají nové router kwargs i service-level backend field, a produkční confirm dialog je dobrý safety net. Níže jsou věci, které stojí za pozornost.


1. Partial failure: create OK → swap FAIL → retry nefunguje (important)

RepartitionTab mutationFn dělá dva kroky sequenčně (create, swap). Pokud create-table uspěje ale swap-tables selže:

  • Tabulka <name>_repartition existuje v bucketu
  • onError resetne phase na "idle" a uživatel vidí error
  • Uživatel klikne "Repartition" znovu → create se pokusí vytvořit _repartition ZNOVU → selže (table already exists), protože if_not_exists není nastaveno

Uživatel zůstane zaseknutý bez jasné cesty ven (musel by ručně smazat _repartition).

Návrhy řešení (pick one):

  • Přidat if_not_exists: true do create body — pak retry přeskočí create a pokračuje na swap
  • Při partial failure (create OK, swap FAIL) nastavit speciální phase (např. "swap_failed") a nabídnout retry JENOM swapu, nebo tlačítko na smazání _repartition
  • Před create zkontrolovat, jestli _repartition tabulka už existuje, a pokud ano, přeskočit rovnou na swap
// Storage.tsx, line ~362
const createBody: Record<string, unknown> = {
  // ...
  if_not_exists: true,  // safe retry after partial failure
};

2. CreateTable model — chybí model_validator pro mutual exclusivity (minor)

Komentář na řádku 29-30 říká "exactly one of columns / source_table_id is required", ale neexistuje Pydantic model_validator který by to enforceoval. Service layer to validuje (storage_service.py:949-958), takže to nerozbije data — ale error response bude 500 (ValueError) místo čistého 422 z Pydantic validace.

Pokud je záměr nechat validaci na service layeru (třeba kvůli konzistenci s CLI path), dalo by se to aspoň zmínit v komentáři u modelu. Pokud ne:

from pydantic import model_validator

class CreateTable(BaseModel):
    # ...
    @model_validator(mode="after")
    def _check_columns_or_source(self) -> "CreateTable":
        if self.columns and self.source_table_id:
            raise ValueError("columns and source_table_id are mutually exclusive")
        if not self.columns and not self.source_table_id:
            raise ValueError("one of columns or source_table_id is required")
        return self

3. BigQuery clustering limit — UI neomezuje max 4 fields (nit)

BigQuery dovoluje max 4 clustering fields. UI je nechá vybrat neomezeně — Storage API to odmítne, ale error message nebude nejjasnější. Zvážit disable dalších column buttonů po výběru 4:

// Storage.tsx, line ~608
<button
  // ...
  disabled={!on && clustering.length >= 4}
  onClick={() => toggleCluster(c)}
>

4. Form controls interaktivní během mutace (nit)

Když běží create+swap (busy === true), submit button je správně disabled přes canSubmit, ale partitioning mode, field selecty a clustering buttony zůstávají klikatelné. Hodnoty jsou už zachycené v closure, takže to nerozbije logiku, ale pro UX by bylo lepší je disablovat.


5. uv.lock revision 3 → 2

Revision lockfilu šla zpátky (3 → 2) — pravděpodobně jen regenerace, ale stojí za ověření že to není unintended downgrade.


Pozitivní věci

  • Branch resolution logic je dobře promyšlená — automatické resolve default branch + explicitní production warning
  • Testy pokrývají jak forward kwargs, tak columns optional (no 422)
  • backend fallback na empty string v service je defenzivní a správný
  • Post-swap delete prompt místo tichého mazání je dobrý UX pattern

yustme added 2 commits June 29, 2026 22:02
- Idempotent copy + partial-failure recovery: create-table now uses
  if_not_exists, so retrying after a failed swap re-runs just the swap
  instead of erroring; a failed swap surfaces a 'delete leftover copy'
  cleanup action.
- CreateTable request model enforces the columns/source XOR via a
  model_validator -> clean 422 instead of a 500 from the service.
- Cap clustering at BigQuery's max of 4 fields in the picker.
- Disable partitioning/clustering form controls while the create+swap
  mutation is in flight.
- Restore uv.lock revision 3 (sync_version downgraded it under old uv).
Lets the user repartition to no partitioning at all (de-partition a table),
with clustering still optional. 'None' mode sends no time/range partitioning
fields and is always valid on its own.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant