Part of #390.
Summary
Add a purpose-built kbagent transformation command group for authoring and editing SQL transformations, porting create_sql_transformation + update_sql_transformation. Scope: both create and the structured block/code edit engine.
MCP source (keboola-mcp-server @ agent-v1.60.0)
src/keboola_mcp_server/tools/components/tools.py (create_sql_transformation, update_sql_transformation + _internal split) and tools/components/utils.py, tools/components/tf_update.py, tools/components/model.py.
- Create: collapses code blocks into a single block named
"Blocks"; splits SQL into statements; derives the output bucket from the transformation name (out.c-{clean_bucket_name(name)}); maps each created_table_names entry to {source: <table>, destination: out.c-<bucket>.<table>}. Component is chosen from SQL dialect (keboola.snowflake-transformation / keboola.google-bigquery-transformation); POSTs via Storage API configuration_create.
- Edit engine (
tf_update.py, 9 ops): add_block, remove_block, rename_block, add_code, remove_code, rename_code, set_code, add_script, str_replace. Block/code IDs (b{i}, b{i}.c{j}) are synthetic, positional, re-derived at apply time (add_ids) — they are NOT persisted, so a kbagent port can reproduce them with a nested-index walk (no jsonpath_ng needed). Round-trips raw statement-arrays ↔ joined-script strings, validates against the component schema, then configuration_update.
- Both tools are tagged
config-diff-preview (the preview itself is a separate serve endpoint reusing the _internal mutator — out of scope for a CLI-first port; the CLI analog is --dry-run + local diff).
What kbagent already has
client.py create_config / update_config hit the same Storage endpoints config new/config update use — reusable verbatim (incl. schema validation, #-secret encryption, dry-run, branch resolution in config_service.py).
- SQL statement splitter already exists:
sync/sql_split.py split_statements() (state machine handling quotes/$$/comments), already wired into config writes via sync/code_extraction.py normalize_blocks_codes_script() — this is exactly the raw↔simplified conversion the MCP tools do.
- Dialect detection is cheaper in kbagent:
default_backend is already parsed from verify_token (client.py, project_service.py, surfaced in project status). No workspace needed — the MCP path can provision a workspace (up to 300 s, billable). Read default_backend instead.
- SQL-transformation component set already known:
sync/code_extraction.py SQL_TRANSFORMATION_COMPONENTS + is_sql_transformation_component().
- Missing: dialect→component_id map (~5 lines), payload shaper incl.
clean_bucket_name (~25 lines), and the 9-op engine + positional-ID numbering (~300 LOC ported from tf_update.py/utils.py, no jsonpath dependency).
Proposed commands
kbagent transformation create --project P --name NAME (--sql 'SELECT …' | --sql-file PATH | -) \
[--created-table NAME …] [--component-id keboola.snowflake-transformation|keboola.google-bigquery-transformation] \
[--description D] [--folder NAME] [--branch ID] [--dry-run]
kbagent transformation show --project P --config-id ID [--component-id ID] [--branch ID]
# prints the block/code tree with synthetic b{i}/b{i}.c{j} IDs (mirrors MCP get_configs numbering)
kbagent transformation edit --project P --config-id ID [--component-id ID] [--branch ID] \
--change-description TEXT \
(--op '{"op":"set_code","block_id":"b0","code_id":"b0.c0","script":"…"}' … | --op-file ops.json) \
[--storage @storage.json] [--dry-run]
Implementation sketch (3-layer)
- L3 none — reuse
create_config / update_config.
- L2 new
transformation_service.py carrying: dialect→component resolver (from default_backend), the create payload shaper (single block, statement split via split_statements, output mapping via ported clean_bucket_name), and the 9-op engine (positional-index walk + add_ids renumbering + raw↔simplified round-trip). --dry-run returns (original, updated) for a local diff via json_utils.compute_diff.
- L1
commands/transformation.py thin group, wired in cli.py; discriminated-union op model in models.py (mirror TfParamUpdate).
Effort: S (create) + M (edit engine)
The edit engine is the only net-new business logic.
Acceptance criteria
Risks / open questions
- Synthetic-ID UX hazard: IDs renumber after every structural op. Batching
remove_block b0 + set_code b1.c0 operates on pre-mutation IDs within a batch but post-mutation across calls — same sharp edge as the MCP tool. Document the "show before edit" workflow (aligns with kbagent's fresh-fetch guidance).
- Dialect default vs actual:
default_backend is the project default; a non-default workspace backend could differ. Keep --component-id as an explicit override.
- Output bucket coupled to name: renaming the transformation later breaks the bucket-name match;
clean_bucket_name can collide (diacritics stripped, 96-char cap). Keep identical to UI/MCP for parity; --created-table is a declarative hint, not a guarantee — document it.
--storage is full replacement — carry the "include ALL mappings you want to keep" warning.
- Most of the MCP tools' value is the LLM-facing prose (delimited identifiers, FQNs, "create at least one table", schema-change caveats). In kbagent that lives in the agent surfaces (
keboola-expert.md, SKILL.md) — a separate, mandatory doc effort (mind the 62 KB keboola-expert.md budget).
Part of #390.
Summary
Add a purpose-built
kbagent transformationcommand group for authoring and editing SQL transformations, portingcreate_sql_transformation+update_sql_transformation. Scope: both create and the structured block/code edit engine.MCP source (
keboola-mcp-server@agent-v1.60.0)src/keboola_mcp_server/tools/components/tools.py(create_sql_transformation,update_sql_transformation+_internalsplit) andtools/components/utils.py,tools/components/tf_update.py,tools/components/model.py."Blocks"; splits SQL into statements; derives the output bucket from the transformation name (out.c-{clean_bucket_name(name)}); maps eachcreated_table_namesentry to{source: <table>, destination: out.c-<bucket>.<table>}. Component is chosen from SQL dialect (keboola.snowflake-transformation/keboola.google-bigquery-transformation); POSTs via Storage APIconfiguration_create.tf_update.py, 9 ops):add_block,remove_block,rename_block,add_code,remove_code,rename_code,set_code,add_script,str_replace. Block/code IDs (b{i},b{i}.c{j}) are synthetic, positional, re-derived at apply time (add_ids) — they are NOT persisted, so a kbagent port can reproduce them with a nested-index walk (nojsonpath_ngneeded). Round-trips raw statement-arrays ↔ joined-script strings, validates against the component schema, thenconfiguration_update.config-diff-preview(the preview itself is a separate serve endpoint reusing the_internalmutator — out of scope for a CLI-first port; the CLI analog is--dry-run+ local diff).What kbagent already has
client.pycreate_config/update_confighit the same Storage endpointsconfig new/config updateuse — reusable verbatim (incl. schema validation,#-secret encryption, dry-run, branch resolution inconfig_service.py).sync/sql_split.pysplit_statements()(state machine handling quotes/$$/comments), already wired into config writes viasync/code_extraction.pynormalize_blocks_codes_script()— this is exactly the raw↔simplified conversion the MCP tools do.default_backendis already parsed fromverify_token(client.py,project_service.py, surfaced inproject status). No workspace needed — the MCP path can provision a workspace (up to 300 s, billable). Readdefault_backendinstead.sync/code_extraction.pySQL_TRANSFORMATION_COMPONENTS+is_sql_transformation_component().clean_bucket_name(~25 lines), and the 9-op engine + positional-ID numbering (~300 LOC ported fromtf_update.py/utils.py, no jsonpath dependency).Proposed commands
Implementation sketch (3-layer)
create_config/update_config.transformation_service.pycarrying: dialect→component resolver (fromdefault_backend), the create payload shaper (single block, statement split viasplit_statements, output mapping via portedclean_bucket_name), and the 9-op engine (positional-index walk +add_idsrenumbering + raw↔simplified round-trip).--dry-runreturns(original, updated)for a local diff viajson_utils.compute_diff.commands/transformation.pythin group, wired incli.py; discriminated-union op model inmodels.py(mirrorTfParamUpdate).Effort: S (create) + M (edit engine)
The edit engine is the only net-new business logic.
Acceptance criteria
transformation createproduces a valid Snowflake/BigQuery transformation config (dialect fromdefault_backend,--component-idoverride).out.c-<clean(name)>.<table>).transformation showlistsb{i}/b{i}.c{j}IDs;transformation editapplies all 9 ops; IDs renumber after structural ops.--dry-runshows a before/after diff without writing.transformation showbefore each edit" workflow note).Risks / open questions
remove_block b0+set_code b1.c0operates on pre-mutation IDs within a batch but post-mutation across calls — same sharp edge as the MCP tool. Document the "show before edit" workflow (aligns with kbagent's fresh-fetch guidance).default_backendis the project default; a non-default workspace backend could differ. Keep--component-idas an explicit override.clean_bucket_namecan collide (diacritics stripped, 96-char cap). Keep identical to UI/MCP for parity;--created-tableis a declarative hint, not a guarantee — document it.--storageis full replacement — carry the "include ALL mappings you want to keep" warning.keboola-expert.md,SKILL.md) — a separate, mandatory doc effort (mind the 62 KBkeboola-expert.mdbudget).