Skip to content

feat(transformation): kbagent transformation group — create + structured edit (port create/update_sql_transformation) #396

Description

@padak

Part of #390.

Summary

Add a purpose-built kbagent transformation command group for authoring and editing SQL transformations, porting create_sql_transformation + update_sql_transformation. Scope: both create and the structured block/code edit engine.

MCP source (keboola-mcp-server @ agent-v1.60.0)

  • src/keboola_mcp_server/tools/components/tools.py (create_sql_transformation, update_sql_transformation + _internal split) and tools/components/utils.py, tools/components/tf_update.py, tools/components/model.py.
  • Create: collapses code blocks into a single block named "Blocks"; splits SQL into statements; derives the output bucket from the transformation name (out.c-{clean_bucket_name(name)}); maps each created_table_names entry to {source: <table>, destination: out.c-<bucket>.<table>}. Component is chosen from SQL dialect (keboola.snowflake-transformation / keboola.google-bigquery-transformation); POSTs via Storage API configuration_create.
  • Edit engine (tf_update.py, 9 ops): add_block, remove_block, rename_block, add_code, remove_code, rename_code, set_code, add_script, str_replace. Block/code IDs (b{i}, b{i}.c{j}) are synthetic, positional, re-derived at apply time (add_ids) — they are NOT persisted, so a kbagent port can reproduce them with a nested-index walk (no jsonpath_ng needed). Round-trips raw statement-arrays ↔ joined-script strings, validates against the component schema, then configuration_update.
  • Both tools are tagged config-diff-preview (the preview itself is a separate serve endpoint reusing the _internal mutator — out of scope for a CLI-first port; the CLI analog is --dry-run + local diff).

What kbagent already has

  • client.py create_config / update_config hit the same Storage endpoints config new/config update use — reusable verbatim (incl. schema validation, #-secret encryption, dry-run, branch resolution in config_service.py).
  • SQL statement splitter already exists: sync/sql_split.py split_statements() (state machine handling quotes/$$/comments), already wired into config writes via sync/code_extraction.py normalize_blocks_codes_script() — this is exactly the raw↔simplified conversion the MCP tools do.
  • Dialect detection is cheaper in kbagent: default_backend is already parsed from verify_token (client.py, project_service.py, surfaced in project status). No workspace needed — the MCP path can provision a workspace (up to 300 s, billable). Read default_backend instead.
  • SQL-transformation component set already known: sync/code_extraction.py SQL_TRANSFORMATION_COMPONENTS + is_sql_transformation_component().
  • Missing: dialect→component_id map (~5 lines), payload shaper incl. clean_bucket_name (~25 lines), and the 9-op engine + positional-ID numbering (~300 LOC ported from tf_update.py/utils.py, no jsonpath dependency).

Proposed commands

kbagent transformation create --project P --name NAME (--sql 'SELECT …' | --sql-file PATH | -) \
    [--created-table NAME …] [--component-id keboola.snowflake-transformation|keboola.google-bigquery-transformation] \
    [--description D] [--folder NAME] [--branch ID] [--dry-run]

kbagent transformation show --project P --config-id ID [--component-id ID] [--branch ID]
    # prints the block/code tree with synthetic b{i}/b{i}.c{j} IDs (mirrors MCP get_configs numbering)

kbagent transformation edit --project P --config-id ID [--component-id ID] [--branch ID] \
    --change-description TEXT \
    (--op '{"op":"set_code","block_id":"b0","code_id":"b0.c0","script":"…"}' … | --op-file ops.json) \
    [--storage @storage.json] [--dry-run]

Implementation sketch (3-layer)

  • L3 none — reuse create_config / update_config.
  • L2 new transformation_service.py carrying: dialect→component resolver (from default_backend), the create payload shaper (single block, statement split via split_statements, output mapping via ported clean_bucket_name), and the 9-op engine (positional-index walk + add_ids renumbering + raw↔simplified round-trip). --dry-run returns (original, updated) for a local diff via json_utils.compute_diff.
  • L1 commands/transformation.py thin group, wired in cli.py; discriminated-union op model in models.py (mirror TfParamUpdate).

Effort: S (create) + M (edit engine)

The edit engine is the only net-new business logic.

Acceptance criteria

  • transformation create produces a valid Snowflake/BigQuery transformation config (dialect from default_backend, --component-id override).
  • Output-table mapping matches MCP (out.c-<clean(name)>.<table>).
  • transformation show lists b{i}/b{i}.c{j} IDs; transformation edit applies all 9 ops; IDs renumber after structural ops.
  • --dry-run shows a before/after diff without writing.
  • Unit tests (each op + round-trip) + E2E test.
  • Doc-sync surfaces per Epic: Port keboola-mcp-server agent tools into native kbagent commands #390 checklist (incl. the "fetch fresh IDs via transformation show before each edit" workflow note).

Risks / open questions

  • Synthetic-ID UX hazard: IDs renumber after every structural op. Batching remove_block b0 + set_code b1.c0 operates on pre-mutation IDs within a batch but post-mutation across calls — same sharp edge as the MCP tool. Document the "show before edit" workflow (aligns with kbagent's fresh-fetch guidance).
  • Dialect default vs actual: default_backend is the project default; a non-default workspace backend could differ. Keep --component-id as an explicit override.
  • Output bucket coupled to name: renaming the transformation later breaks the bucket-name match; clean_bucket_name can collide (diacritics stripped, 96-char cap). Keep identical to UI/MCP for parity; --created-table is a declarative hint, not a guarantee — document it.
  • --storage is full replacement — carry the "include ALL mappings you want to keep" warning.
  • Most of the MCP tools' value is the LLM-facing prose (delimited identifiers, FQNs, "create at least one table", schema-change caveats). In kbagent that lives in the agent surfaces (keboola-expert.md, SKILL.md) — a separate, mandatory doc effort (mind the 62 KB keboola-expert.md budget).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions