diff --git a/plugins/kbagent/skills/kbagent-cicd-migration/SKILL.md b/plugins/kbagent/skills/kbagent-cicd-migration/SKILL.md new file mode 100644 index 00000000..78b2240b --- /dev/null +++ b/plugins/kbagent/skills/kbagent-cicd-migration/SKILL.md @@ -0,0 +1,173 @@ +--- +name: kbagent-cicd-migration +description: > + Use when migrating an existing kbc (keboola-as-code) GitHub CI/CD pipeline to + the new kbagent (keboola-agent-cli) sync engine. Covers: converting per-project + pull/push PR workflows, multi-project repos (e.g. L0/L1 dev->prod promotion), + branch->environment mapping, GitHub secrets/variables/environments setup, the + install step (uv tool install instead of downloading a Go binary), and the + kbc->kbagent command/flag/env-var mapping. Triggers: migrate CI/CD, migrate + pipeline, kbc to kbagent, port GitHub Actions, CLI-based-sync-demo, kbc pull + push CI, project-as-code CI migration, replace kbc binary in CI, gitops + migration, multi-project promotion, dev to prod Keboola, KBC_STORAGE_API_TOKEN + to KBC_TOKEN, sync push CI, sync pull CI. +--- + +# kbc -> kbagent CI/CD Migration + +Guides a customer through porting a `kbc` GitHub CI/CD pipeline (the +[CLI-based-sync-demo](https://github.com/keboola/CLI-based-sync-demo) shape: +per-project pull/push, multi-project promotion, branch-gated deploys) to the new +`kbagent sync` engine, emitting **clean kbagent-native workflows**. + +## Reality check — this is a one-time BREAKING migration, not a command swap + +Do **not** tell the user they can just swap `kbc` for `kbagent` in place. Three +hard incompatibilities make this a deliberate cutover (verified against the code): + +1. **The on-disk layout/format is different and incompatible.** + - `kbc` writes per config: `config.json` + `meta.json` + `description.md` (JSON). + - `kbagent` writes per config: **`_config.yml`** (YAML, with `name`/`description`/ + `parameters` hoisted + a `_configuration_extra` block) + extracted code files + (`constants.py:425`, `sync/config_format.py`). + - The first `kbagent sync pull` therefore **rewrites every configuration** into a + new format. The old `config.json`/`meta.json` files are **not read** by kbagent + and become orphans that must be deleted. Expect a **massive reformatting diff**. +2. **kbagent sync is an ORCHESTRATOR, not cwd-per-folder.** `kbc pull` runs against + whatever directory you `cd` into. `kbagent sync pull` *requires* `--project ALIAS` + (resolved from a central config store) or `--all-projects` (`sync.py:67,495`). In + CI we bridge this with env-injection: `KBAGENT_PROJECT_FROM_ENV=1` synthesizes a + project under the reserved alias `__env__`, and every command passes + `--project __env__ --directory `. +3. **The two tools cannot co-own the same tree.** Because the source-of-truth files + differ, you cannot have `kbc push` and `kbagent push` both treating one directory + as canonical. You must cut over. + +### Consequence (this is what the user correctly anticipated) +- The migration is a **one-time conversion commit** on a **dedicated branch**, where + the JSON tree is replaced by the YAML tree. Reviewing that diff line-by-line is + impractical; you verify by **behavior** (`sync diff` clean, dry-run push empty), not + by reading the reformat. Merging it to `main` is effectively a tooling version bump. +- Until cutover, keep the legacy `kbc` workflows; afterwards delete them in the same PR. + +> **Live-verified (project 153, GCP europe-west3, 2026-06):** +> - `KBAGENT_PROJECT_FROM_ENV=1` + `--project __env__` works for `init` and `pull`. +> - kbc pulled **143 `config.json` + 187 `meta.json`** (JSON); kbagent pulled +> **147 `_config.yml`** (YAML). Zero overlap in file format. +> - `sync init --adopt-existing` on a **kbc-produced tree** succeeds, but the very +> next `sync diff` reports **`0 to create, 0 to update, 136 to delete`** — kbagent +> does not read kbc's `config.json` at all, so it sees every existing config as a +> local deletion. **A `sync push --allow-delete` here would delete all 136 remote +> configs.** Adopt-existing adopts only the manifest, NOT the configs. +> - The correct path — adopt → `sync pull --force` (writes `_config.yml`) → `git rm` +> the orphaned `config.json`/`meta.json` — converged the diff from 136 deletes to +> ~2 (plus ~9 remote-only scheduler/variables configs that pull/diff treat +> inconsistently — verify these per project). Both file sets coexist after pull +> until you delete the kbc files, so the `git rm` step is mandatory, not optional. +> +> **Operator rule:** never run `sync push` against an adopted-but-not-yet-pulled kbc +> tree. Always `sync pull --force` first, confirm `sync diff` is clean, THEN enable +> the push lane. + +### What still carries over unchanged +The *orchestration shell* is CLI-agnostic: manual/scheduled pull that commits state, +PR validation, GitHub-Environment-gated push, branch→env mapping, per-project loops. +Only three mechanics change: **install** (`uv tool install` not a binary download), +**commands/flags** (`kbagent sync ...`, see +[references/command-mapping.md](references/command-mapping.md)), and **auth env vars** +(`KBAGENT_PROJECT_FROM_ENV=1` + `KBC_TOKEN` + `KBC_STORAGE_API_URL`). + +## Workflow + +### Step 1 — Analyze the existing repo (always start here) +Run the engine in dry-run mode to inventory projects and the legacy CI it replaces: + +```bash +python /scripts/migrate_cicd.py /path/to/repo +``` + +It prints every project found (one per `.keboola/manifest.json`), each project's +id / stack host / required token secret name, any `ignoredComponents` ("subset of +a project" — see Step 5), and the legacy `kbc` workflow/action files it supersedes. + +### Step 2 — Pick a version pin (decide before generating) +- **Pinned (recommended for prod lanes):** `--version 0.58.0` (PyPI, once published) + or `--git-ref v0.58.0` (git tag, until PyPI exists). Reproducible CI. +- **Unpinned (`keboola-agent-cli`, resolves to latest):** only acceptable for a + non-prod/scratch lane. Warn the user: unpinned + the current auto-update behavior + means non-deterministic CI runs. + +### Step 3 — Generate the clean workflows +```bash +python /scripts/migrate_cicd.py /path/to/repo --write \ + --version 0.58.0 --main-branch main --schedule "0 * * * *" +``` +Produces: +- `.github/workflows/kbagent-validate.yml` — on PR: `sync diff` + `sync push --dry-run` per project (read-only drift + secret-encryption preflight). +- `.github/workflows/kbagent-pull.yml` — manual + optional cron: `sync pull --force` per project, commits state back. +- `.github/workflows/kbagent-push.yml` — manual, **GitHub-Environment-gated**: `sync push` per project, with an `allow_delete` input. + +The legacy `kbc` files are **left in place** — review the new ones, then delete the +old workflows/actions in the same PR. + +### Step 3b — Perform the one-time config conversion (dedicated branch) +This is the breaking part. On a fresh migration branch, for each project, convert +the JSON tree to kbagent's YAML tree and remove the orphaned kbc files: + +```bash +git checkout -b migrate/kbc-to-kbagent +# Per project (do ONE non-prod project first and verify): +KBAGENT_PROJECT_FROM_ENV=1 KBC_TOKEN=$L0_TOKEN KBC_STORAGE_API_URL=https://connection.keboola.com \ + kbagent sync init --adopt-existing --project __env__ --directory L0 +KBAGENT_PROJECT_FROM_ENV=1 KBC_TOKEN=$L0_TOKEN KBC_STORAGE_API_URL=https://connection.keboola.com \ + kbagent sync pull --project __env__ --directory L0 +# Remove orphaned kbc JSON files that kbagent no longer reads: +find L0 -name config.json -o -name meta.json | xargs git rm --cached --ignore-unmatch +git add -A +``` +Verify by **behavior**, not by reading the reformat diff: a follow-up +`sync diff --project __env__ -d L0` must be clean and `sync push --dry-run` empty. +Only then repeat for the remaining projects and the production lane. + +### Step 4 — Set up GitHub secrets, variables, environments +The engine prints exact `gh` commands. The model: **one Storage API token secret +per project** (`KBC_TOKEN_`), and two **Environments** (`prod`, `dev`) so +prod pushes require approval. See [references/secrets-setup.md](references/secrets-setup.md) +for the full mapping from the old `secrets.KBC_SAPI_TOKEN_*` / `vars.KBC_*` scheme. + +### Step 5 — Confirm scope ("subset of a project") and branching +- **Subset:** if a project should only sync part of its config tree, set + `ignoredComponents` (and/or `allowedBranches`) in that project's + `.keboola/manifest.json` — `kbagent sync` honors both, exactly like `kbc`. +- **Branching:** the old model used a fixed branch id per env. The new model maps + git branch → Keboola dev branch via `.keboola/branch-mapping.json` + + `kbagent sync branch-link`. For PR-based promotion this is usually *better*: + a PR branch links to a Keboola dev branch, `main` pushes to production. Add + `--git-branching` to annotate, and walk the user through `branch-link` if they + want per-PR isolated dev branches. If they want to keep the simple single-branch + (production) model, leave branch-mapping at the default (null = production). + +### Step 6 — Validate before merging +- Open the migration PR; the `kbagent-validate` workflow runs `sync diff` — confirm + the diff is empty (no unintended drift) against each project. +- Manually run `kbagent-pull` once and confirm the committed state matches what + `kbc pull` produced (the layout is identical; `git diff` should be tiny — mostly + YAML vs JSON config-body formatting differences if any). +- Do a `kbagent-push` dry-run (the validate workflow already does this) and read + the planned changes before the first real gated push. + +## Guardrails (state these to the user) +- **Never** add `--allow-plaintext-on-encrypt-failure` to CI push — it silently + uploads `#`-secrets in cleartext if the Encryption API is down. The generated + push is fail-closed by design. +- `sync push --allow-delete` deletes remote configs removed locally. It is wired to + the `allow_delete` workflow input (default off). Treat it like the old `--force`. +- Tokens live **only** in GitHub secrets and are injected as env vars per step; the + generated workflows never write a `config.json` to disk. + +## Reference material +- [references/migration-runbook.md](references/migration-runbook.md) — **the ordered PR sequence / cutover plan** (pre-flight → conversion PR → start-over). Use this when the user asks "which PRs, what order, how do I cut over." +- [references/branching-model.md](references/branching-model.md) — **how to choose** single-branch (Model A) vs git-branching (Model B), with a decision table. +- [references/command-mapping.md](references/command-mapping.md) — kbc ↔ kbagent commands, flags, env vars. +- [references/secrets-setup.md](references/secrets-setup.md) — secrets/vars/environments migration table + `gh` setup. +- `scripts/migrate_cicd.py` — the analyzer + generator (stdlib only). diff --git a/plugins/kbagent/skills/kbagent-cicd-migration/references/branching-model.md b/plugins/kbagent/skills/kbagent-cicd-migration/references/branching-model.md new file mode 100644 index 00000000..65d02942 --- /dev/null +++ b/plugins/kbagent/skills/kbagent-cicd-migration/references/branching-model.md @@ -0,0 +1,67 @@ +# Choosing a branching model + +kbagent supports two ways to map your git repo to Keboola branches. Pick one up +front — it changes how `sync init` is run and what `push` touches. + +## The two models + +### Model A — Single-branch / production-direct +- Each project directory maps to **one Keboola project's production branch**. +- `.keboola/branch-mapping.json` stays at its default (`null` = production). +- `sync pull` / `sync push` read and write that project's production branch directly. +- Promotion across environments (dev → prod project) is a **git merge** between + project directories/repos, then a `push` to the next project (the demo's L0→L1 + shape). + +**Choose A when:** +- You're experimenting, or driving a single dev project locally instead of kbc. +- Your "environments" are *separate Keboola projects* (L0/L1), not dev branches. +- You want the simplest mental model and fewest moving parts. + +**Trade-off:** a `push` writes straight to the production branch of that project — +there is no isolated staging copy inside Keboola. Review happens in git, not in KBC. + +### Model B — Git-branching (Keboola dev-branch isolation) +- `sync init --git-branching` creates `.keboola/branch-mapping.json`. +- Each **git branch** links to a **Keboola development branch** (an isolated server- + side copy) via `kbagent sync branch-link --branch-name `. +- Work on a PR branch → `push` lands in its Keboola dev branch (safe, isolated); + merge to `main` → `push` lands in production. +- `kbagent sync branch-status` shows the mapping; `branch-unlink` detaches. + +**Choose B when:** +- Multiple people open PRs against the same project and you want each change tested + in isolation inside Keboola before it hits production. +- You already use Keboola's development-branches feature. +- You want PRs to never write production directly. + +**Trade-off:** more lifecycle to manage (create/link/unlink dev branches, clean them +up), and the mapping file is per-clone state. + +## Decision shortcut + +| Your situation | Model | +|---|---| +| "I just want to manipulate one project with kbagent instead of kbc" | **A** (start here) | +| Separate dev/prod **projects** promoted by git merge (L0/L1) | **A** | +| PR-per-change, multiple contributors, want isolated server-side testing | **B** | +| You rely on Keboola development branches today | **B** | + +You can start on **A** and adopt **B** later: run `sync init --git-branching` and +`branch-link` when you actually need per-PR isolation. Moving A→B is additive (it adds +a mapping file); it does not require re-converting the config tree. + +## How the model shows up in commands + +```bash +# Model A (production-direct) — nothing special: +kbagent sync pull --project -d +kbagent sync push --project -d + +# Model B (git-branching): +kbagent sync init --git-branching --project -d +git checkout -b feature/x +kbagent sync branch-link --project -d --branch-name feature/x +kbagent sync pull/push --project -d # now targets the dev branch +kbagent sync branch-status --project -d +``` diff --git a/plugins/kbagent/skills/kbagent-cicd-migration/references/command-mapping.md b/plugins/kbagent/skills/kbagent-cicd-migration/references/command-mapping.md new file mode 100644 index 00000000..7fa4e2f1 --- /dev/null +++ b/plugins/kbagent/skills/kbagent-cicd-migration/references/command-mapping.md @@ -0,0 +1,59 @@ +# kbc ↔ kbagent command / flag / env mapping + +Authoritative mapping used by the migration generator. Verify flags against your +installed `kbagent` version (`kbagent sync pull --help`); the new CLI evolves fast. + +## Install + +| kbc (old) | kbagent (new) | +|---|---| +| Download Go binary zip from `keboola/keboola-as-code` GitHub release, unzip to `/usr/local/bin/kbc` | `uv tool install keboola-agent-cli==` (PyPI) or `uv tool install 'git+https://github.com/keboola/cli@'` | +| `kbc --version` | `kbagent version` | +| Custom `install` composite action | `astral-sh/setup-uv@v5` + one `uv tool install` line | + +## Core sync commands + +| kbc (old) | kbagent (new) | Notes | +|---|---|---| +| `kbc init -d DIR --allow-target-env` | `kbagent sync init --directory DIR [--adopt-manifest]` | `--adopt-manifest` reuses an existing `.keboola/manifest.json` written by `kbc` | +| `kbc persist -d DIR` | *(folded into `sync pull`)* | No separate persist step; pull writes manifest + new objects | +| `kbc pull -d DIR --force` | `kbagent sync pull --directory DIR --force` | `--force` overrides local-vs-remote conflicts (3-way diff) | +| `kbc push -d DIR` | `kbagent sync push --directory DIR` | Encrypts `#`-secrets fail-closed before write | +| `kbc push -d DIR --force` | `kbagent sync push --directory DIR --allow-delete` | `--allow-delete` removes remote configs deleted locally | +| `kbc push --dry-run` / push-dry action | `kbagent sync push --dry-run --directory DIR` | Shows planned changes without writing | +| `kbc diff -d DIR` | `kbagent sync diff --directory DIR [--json]` | `--json` gives structured drift for CI gating | +| `kbc status` | `kbagent sync status --directory DIR` | | +| `kbc validate` (JSON-schema) | *(no direct equivalent — gap)* | Use `sync diff` for drift; schema validation is not ported | + +## Auth / environment variables + +| kbc (old) | kbagent (new) | Notes | +|---|---|---| +| `KBC_STORAGE_API_TOKEN` | `KBC_TOKEN` | Storage API token | +| `KBC_STORAGE_API_HOST` (bare host) | `KBC_STORAGE_API_URL` (full URL) | `connection.keboola.com` → `https://connection.keboola.com` | +| *(implicit)* | `KBAGENT_PROJECT_FROM_ENV=1` | **Required** opt-in so kbagent synthesizes an ephemeral project from the env in CI (no `config.json` on disk). See `constants.py:163`, `config_store.py:193` | +| `KBC_PROJECT_ID`, `KBC_BRANCH_ID`, `KBC_BRANCHES` | *(from manifest + branch-mapping)* | Project id comes from `.keboola/manifest.json`; branch from `.keboola/branch-mapping.json` | + +## Branching + +| kbc (old) | kbagent (new) | +|---|---| +| Fixed `KBC_BRANCH_ID` per env; `allowedBranches` in manifest | `.keboola/branch-mapping.json` (git branch → Keboola branch id; `null` = production) managed by `kbagent sync branch-link / branch-unlink / branch-status` | +| Branch dir under repo (`main/`) | Same on-disk layout; mapping decides which Keboola branch a git branch targets | + +## Subset of a project + +Both CLIs honor manifest-level scoping — no command change needed: + +- `allowedBranches: [""]` — restrict which branches sync. +- `ignoredComponents: ["keboola.foo", ...]` — exclude component types. + +`kbagent` parses both (`sync/manifest.py:120`). Additionally, `sync pull` flags +`--skip-storage` / `--skip-jobs` / `--with-table-samples` control how much +*metadata* (beyond configs) is pulled — orthogonal to the config subset. + +## What has NO clean port (call out to the user) +- `kbc validate` JSON-schema validation. +- `kbc ci workflows` generator itself (this skill replaces it). +- Templates / dbt / CI-scaffold subsystems (`kbc template`, `kbc dbt`) — keep `kbc` + for those; they are out of scope for sync CI/CD. diff --git a/plugins/kbagent/skills/kbagent-cicd-migration/references/migration-runbook.md b/plugins/kbagent/skills/kbagent-cicd-migration/references/migration-runbook.md new file mode 100644 index 00000000..cacfa695 --- /dev/null +++ b/plugins/kbagent/skills/kbagent-cicd-migration/references/migration-runbook.md @@ -0,0 +1,93 @@ +# Migration runbook — kbc → kbagent (PR sequence) + +The ordered, low-risk way to cut a repo over. This is a **clean cutover**, not a +coexistence: kbc (`config.json`/`meta.json`) and kbagent (`_config.yml`) cannot both +own the same tree (live-verified — see the SKILL.md reality note). Plan it as one +conversion PR plus housekeeping, then everyone re-branches from the new `main`. + +## Answer to "can I transition seamlessly to a new branch?" +No co-existence, but yes a controlled cutover: +1. One **conversion PR** flips the whole repo from JSON→YAML + swaps the workflows. +2. Merge it to `main`. +3. **Delete/redo every old branch** — they carry the incompatible kbc layout and can + never cleanly merge into the converted `main`. +4. Everyone branches fresh from the new `main` with the new workflows. + +## Pre-flight (do once, before any PR) +- [ ] **Announce a change freeze** on the repo + the Keboola projects for the + conversion window. Any config edit made in the UI between "pull" and "cutover" + becomes drift you'll chase. Keep it short. +- [ ] **Pick a kbagent version** and pin it (`keboola-agent-cli==X.Y.Z` or + `git+...@vX.Y.Z`). Never unpinned on a prod lane. +- [ ] **Set GitHub secrets**: one `KBC_TOKEN_` per project (see + `references/secrets-setup.md`). +- [ ] **Create GitHub Environments** `dev` + `prod`; add required reviewers to `prod`. +- [ ] **Inventory** with the skill's analyzer (dry-run): confirm every project and the + legacy files it will replace. + `python /scripts/migrate_cicd.py /path/to/repo` + +## PR 1 — Conversion (the big one) → branch `migrate/kbc-to-kbagent` +Do the projects one at a time; **start with a non-prod project (e.g. `dev`/`L0`)**. + +Per project `` (with its token in the env): +```bash +export KBAGENT_PROJECT_FROM_ENV=1 KBC_TOKEN=$TOKEN \ + KBC_STORAGE_API_URL=https:// +kbagent sync init --adopt-existing --project __env__ --directory +kbagent sync pull --force --project __env__ --directory # writes _config.yml +# Drop the orphaned kbc files kbagent does not read: +find \( -name config.json -o -name meta.json \) -exec git rm -q {} + +# VERIFY BY BEHAVIOR (must be clean before you trust the project): +kbagent sync diff --project __env__ --directory +``` +Acceptance for each project: `sync diff` shows `0 to create, 0 to update, 0 to delete`. +- If a handful of **scheduler / variables** configs show as "to create" (a known + wrinkle from the live test), pull again and confirm they settle; if they persist, + note them in the PR and reconcile manually before enabling push. + +Then, still on the same branch: +```bash +# Generate the clean kbagent-native workflows: +python /scripts/migrate_cicd.py /path/to/repo --write --version X.Y.Z +# Remove the legacy kbc CI (the analyzer listed these): +git rm -r .github/actions/kbc_* .github/workflows/KBC_*.yml # adjust to your repo +git add -A && git commit -m "Migrate kbc -> kbagent: convert configs + workflows" +``` + +**Review this PR by behavior, not by diff.** The reformat touches hundreds of files; +reading it line-by-line is pointless. Trust: +- the `kbagent-validate` workflow runs on the PR and `sync diff` is clean per project; +- a `sync push --dry-run` (also in validate) reports no changes. + +Merge to `main` once validate is green. + +## PR 2 — Housekeeping (optional, after merge) +- [ ] Branch protection on `main`; require the `kbagent-validate` check. +- [ ] Tune the pull schedule cron / push approval reviewers. +- [ ] Update the repo README to the new install + commands. +- [ ] Decide the branching model (next section). + +## After merge — start over +- [ ] **Close or recreate every open PR** that was based on the kbc layout. They diff + against JSON files that no longer exist; rebasing them is not worth it — redo the + change on a fresh branch from the converted `main`. +- [ ] **Delete stale feature branches** (`git push origin --delete `). +- [ ] Tell contributors to **re-clone or hard-reset** to the new `main`. +- [ ] First real push: run `kbagent push` (workflow_dispatch) to `dev` first, approve, + verify in the Keboola UI, then to `prod`. + +## Branching model — pick one +| Model | When | How | +|---|---|---| +| **Single-branch (production)** | Each git branch/project maps straight to a production Keboola project (the demo's L0/L1 promotion) | Leave `.keboola/branch-mapping.json` at default (null = production); gate prod pushes via the GitHub Environment | +| **Git-branching (dev isolation)** | You want each PR to deploy to an isolated Keboola dev branch, then merge to prod on merge-to-main | `kbagent sync init --git-branching`; `kbagent sync branch-link --project __env__ --branch-name ` in the PR workflow; `main` maps to production | + +For most teams already doing PR-per-change promotion, **git-branching** is the closer +fit and is safer (no direct prod writes from PRs). Migrate to it in PR 2, not PR 1. + +## Hard guardrails (repeat to the user) +- **Never** `kbagent sync push` against an adopted-but-not-yet-pulled kbc tree — it + reports every config as "to delete" (136 in the live test) and `--allow-delete` + would wipe the project. +- **Never** `--allow-plaintext-on-encrypt-failure` in CI. +- Keep the change freeze until `main` is converted and the first dev push is verified. diff --git a/plugins/kbagent/skills/kbagent-cicd-migration/references/secrets-setup.md b/plugins/kbagent/skills/kbagent-cicd-migration/references/secrets-setup.md new file mode 100644 index 00000000..5b07a97f --- /dev/null +++ b/plugins/kbagent/skills/kbagent-cicd-migration/references/secrets-setup.md @@ -0,0 +1,52 @@ +# GitHub secrets / variables / environments setup + +The legacy CLI-based-sync-demo split config across **repo secrets**, **repo +variables**, and **GitHub Environments**. The kbagent model is simpler: one token +secret per project, the stack URL baked from each manifest, environments only for +push approval. + +## Migration table + +| Legacy (kbc demo) | Type | kbagent (new) | Type | Notes | +|---|---|---|---|---| +| `secrets.KBC_SAPI_TOKEN_L0` | secret | `secrets.KBC_TOKEN_L0` | secret | One per project; injected as `KBC_TOKEN` | +| `secrets.KBC_SAPI_TOKEN_L1` | secret | `secrets.KBC_TOKEN_L1` | secret | | +| `vars.KBC_SAPI_HOST` | variable | *(baked from manifest `apiHost`)* | — | Override per project in the generated `env:` block if you use a non-default stack | +| `vars.KBC_PROJECT_ID_L0/L1` | variable | *(from `.keboola/manifest.json`)* | — | No longer a CI variable | +| `vars.KBC_BRANCH_ID_L0/L1` | variable | *(from `.keboola/branch-mapping.json`)* | — | Only if using git-branching mode | +| Environments `prod` / `dev` | environment | Environments `prod` / `dev` | environment | **Keep** — used for push approval gating | + +## Setup with `gh` + +```bash +REPO=/ + +# One Storage API token per project (use environment-scoped secrets for prod): +gh secret set KBC_TOKEN_L0 --repo "$REPO" # paste project 9996 token +gh secret set KBC_TOKEN_L1 --repo "$REPO" # paste project 9997 token + +# Environments for approval gating: +gh api -X PUT "repos/$REPO/environments/dev" +gh api -X PUT "repos/$REPO/environments/prod" +``` + +Then in the GitHub UI (or via the environments API): +1. Scope `KBC_TOKEN_*` for production projects to the **prod** environment. +2. Add **required reviewers** to the `prod` environment so `kbagent push` to prod + blocks on manual approval (this replaces the demo's environment gating). +3. Optionally restrict the `prod` environment to the `main` branch. + +## Why no token in config.json +`kbagent` can read a committed `.kbagent/config.json` with multiple project +aliases, but that file stores tokens — unsafe to commit. In CI we instead set +`KBAGENT_PROJECT_FROM_ENV=1` + `KBC_TOKEN` + `KBC_STORAGE_API_URL` per step, so the +token exists only as a masked GitHub secret in the runner's env, never on disk. +This is the direct, safer analog of the demo's per-project `KBC_SAPI_TOKEN_*` +secret model. + +## Security guardrails +- Do **not** commit `.kbagent/config.json` with tokens (the new CLI auto-writes a + `.gitignore` for its config dir — `config_store.py:359`). +- Do **not** pass `--allow-plaintext-on-encrypt-failure` in CI. +- Prefer environment-scoped secrets + required reviewers for any lane that pushes + to a production project. diff --git a/plugins/kbagent/skills/kbagent-cicd-migration/scripts/migrate_cicd.py b/plugins/kbagent/skills/kbagent-cicd-migration/scripts/migrate_cicd.py new file mode 100755 index 00000000..70d08f64 --- /dev/null +++ b/plugins/kbagent/skills/kbagent-cicd-migration/scripts/migrate_cicd.py @@ -0,0 +1,406 @@ +#!/usr/bin/env python3 +"""Migrate a kbc (keboola-as-code) GitHub CI/CD repo to kbagent (keboola-agent-cli). + +This is the engine the ``kbagent-cicd-migration`` skill drives. It: + + 1. Discovers every Keboola project in the repo by locating ``.keboola/manifest.json`` + files (supports the multi-project layout, e.g. ``L0/``, ``L1/``). + 2. Reads each manifest's ``project.id`` / ``project.apiHost`` / + ``allowedBranches`` / ``ignoredComponents`` so generated workflows are + project-accurate and the "subset of a project" lever is surfaced. + 3. Detects the legacy kbc CI/CD it will replace (``.github/workflows`` + + ``.github/actions`` referencing ``kbc``). + 4. Emits **clean kbagent-native** GitHub Actions workflows + (validate / pull / push) that use ``uv tool install`` + ``kbagent sync`` + with the env-injection auth model — no per-CLI composite actions, no + committed tokens. + 5. Prints the exact GitHub **secrets + variables + environments** the new + workflows need, with copy-paste ``gh`` CLI commands. + +Stdlib only. Dry-run by default; pass ``--write`` to write files. + +Usage: + python migrate_cicd.py [--write] \\ + [--version 0.58.0 | --git-ref vX.Y.Z] \\ + [--main-branch main] [--schedule "0 * * * *"] \\ + [--git-branching] + +Examples: + # Inspect what would change (no writes): + python migrate_cicd.py ../CLI-based-sync-demo + + # Generate workflows pinned to a published PyPI version: + python migrate_cicd.py ../CLI-based-sync-demo --write --version 0.58.0 + + # Pin to a git tag instead (no PyPI release yet): + python migrate_cicd.py ../CLI-based-sync-demo --write --git-ref v0.58.0 +""" + +from __future__ import annotations + +import argparse +import json +import re +import sys +from dataclasses import dataclass, field +from pathlib import Path + +# --------------------------------------------------------------------------- # +# Project discovery +# --------------------------------------------------------------------------- # + + +@dataclass +class Project: + """A single Keboola project discovered via its manifest.""" + + alias: str # derived from the directory name, uppercased for secret naming + directory: str # path relative to repo root (".", "L0", "L1", ...) + project_id: str + api_host: str + allowed_branches: list[str] = field(default_factory=list) + ignored_components: list[str] = field(default_factory=list) + + @property + def token_secret(self) -> str: + return f"KBC_TOKEN_{self.alias}" + + @property + def stack_url(self) -> str: + # apiHost in the manifest is bare ("connection.keboola.com"); kbagent's + # KBC_STORAGE_API_URL wants a full URL. + host = self.api_host.strip() + if host.startswith(("http://", "https://")): + return host + return f"https://{host}" + + +def _alias_from_dir(directory: str) -> str: + name = Path(directory).name or "PROJECT" + return re.sub(r"[^A-Za-z0-9]+", "_", name).strip("_").upper() or "PROJECT" + + +def discover_projects(repo: Path) -> list[Project]: + """Find every ``.keboola/manifest.json`` and parse it into a Project.""" + projects: list[Project] = [] + for manifest_path in sorted(repo.glob("**/.keboola/manifest.json")): + project_dir = manifest_path.parent.parent + rel = project_dir.relative_to(repo).as_posix() + rel = "." if rel == "" else rel + try: + data = json.loads(manifest_path.read_text(encoding="utf-8")) + except (OSError, json.JSONDecodeError) as exc: + print(f" ! skipping {manifest_path}: {exc}", file=sys.stderr) + continue + proj = data.get("project", {}) + projects.append( + Project( + alias=_alias_from_dir(rel), + directory=rel, + project_id=str(proj.get("id", "")), + api_host=str(proj.get("apiHost", "")), + allowed_branches=[str(b) for b in data.get("allowedBranches", [])], + ignored_components=[str(c) for c in data.get("ignoredComponents", [])], + ) + ) + return projects + + +def detect_legacy_ci(repo: Path) -> list[str]: + """Return a list of legacy kbc CI/CD files that the migration supersedes.""" + found: list[str] = [] + gh = repo / ".github" + if not gh.exists(): + return found + for path in sorted(gh.glob("**/*")): + if not path.is_file() or path.suffix not in {".yml", ".yaml"}: + continue + try: + text = path.read_text(encoding="utf-8", errors="ignore") + except OSError: + continue + # A legacy file is one that invokes the kbc binary or its env vars. + if re.search(r"\bkbc\s+(pull|push|init|persist|diff|validate)\b", text) or ( + "KBC_STORAGE_API_TOKEN" in text + ): + found.append(path.relative_to(repo).as_posix()) + return found + + +# --------------------------------------------------------------------------- # +# Workflow generation (clean kbagent-native) +# --------------------------------------------------------------------------- # + + +def _install_steps(version: str | None, git_ref: str | None) -> str: + if git_ref: + spec = f"git+https://github.com/keboola/cli@{git_ref}" + elif version: + spec = f"keboola-agent-cli=={version}" + else: + # Unpinned: only acceptable for non-production lanes. The skill warns. + spec = "keboola-agent-cli" + return ( + " - name: Install uv\n" + " uses: astral-sh/setup-uv@v5\n" + " - name: Install kbagent\n" + f" run: uv tool install '{spec}'\n" + " - name: Show version\n" + " run: kbagent version\n" + ) + + +def _project_step(p: Project, command: str, step_name: str) -> str: + """Render one per-project step using the env-injection auth model. + + kbagent sync is an orchestrator over registered project aliases, NOT a + cwd-per-folder tool like kbc. In CI we synthesize an ephemeral project from + the env (KBAGENT_PROJECT_FROM_ENV=1 -> reserved alias ``__env__``) and pass + ``--project __env__`` explicitly. A first idempotent ``init --adopt-existing`` + registers the committed manifest before the real command. + """ + return ( + f" - name: {step_name} ({p.alias})\n" + " env:\n" + ' KBAGENT_PROJECT_FROM_ENV: "1"\n' + f" KBC_TOKEN: ${{{{ secrets.{p.token_secret} }}}}\n" + f" KBC_STORAGE_API_URL: {p.stack_url}\n" + " run: |\n" + f" kbagent sync init --adopt-existing --project __env__ --directory '{p.directory}' || true\n" + f" kbagent sync {command} --project __env__ --directory '{p.directory}'\n" + ) + + +def gen_validate(projects: list[Project], main_branch: str) -> str: + diff_steps = "".join(_project_step(p, "diff --json", f"Diff {p.directory}") for p in projects) + dry_run_steps = "".join( + _project_step(p, "push --dry-run", f"Push dry-run {p.directory}") for p in projects + ) + return ( + "# Generated by kbagent-cicd-migration. Clean kbagent-native CI.\n" + "name: kbagent validate\n" + "on:\n" + " pull_request:\n" + " workflow_dispatch:\n" + "permissions:\n" + " contents: read\n" + "jobs:\n" + " validate:\n" + " runs-on: ubuntu-latest\n" + " steps:\n" + " - uses: actions/checkout@v4\n" + f"{_install_steps_placeholder()}" + " # Show drift between the committed config files and each remote\n" + " # project. `sync diff` is read-only. The push dry-run below also\n" + " # surfaces secret-encryption problems before a real push.\n" + f"{diff_steps}" + f"{dry_run_steps}" + ) + + +def gen_pull(projects: list[Project], main_branch: str, schedule: str | None) -> str: + on_block = " workflow_dispatch:\n" + if schedule: + on_block += f" schedule:\n - cron: '{schedule}'\n" + steps = "".join(_project_step(p, "pull --force", f"Pull {p.directory}") for p in projects) + return ( + "# Generated by kbagent-cicd-migration. Pulls remote state into git.\n" + "name: kbagent pull\n" + "on:\n" + f"{on_block}" + "permissions:\n" + " contents: write\n" + "jobs:\n" + " pull:\n" + " runs-on: ubuntu-latest\n" + " steps:\n" + " - uses: actions/checkout@v4\n" + f"{_install_steps_placeholder()}" + f"{steps}" + " - name: Commit pulled state\n" + " run: |\n" + " git config user.name 'Keboola kbagent'\n" + " git config user.email 'kbagent@users.noreply.github.com'\n" + " git add -A\n" + " git commit -m \"Automatic kbagent pull $(date -u +%Y-%m-%dT%H:%M:%SZ)\" || echo 'no changes'\n" + " git push\n" + ) + + +def gen_push(projects: list[Project], main_branch: str, git_branching: bool) -> str: + # GitHub Environments gate production approvals; the prod environment maps to + # the main branch, mirroring the legacy `github.ref_name == 'main'` logic. + env_expr = f"${{{{ github.ref_name == '{main_branch}' && 'prod' || 'dev' }}}}" + # The workflow_dispatch boolean arrives as the string 'true'/'false'; the GH + # expression maps it to the --allow-delete flag (opt-in deletion of remote + # configs that were removed locally). + delete_expr = "${{ github.event.inputs.allow_delete == 'true' && '--allow-delete' || '' }}" + steps = "".join( + _project_step(p, f"push {delete_expr}", f"Push {p.directory}") for p in projects + ) + return ( + "# Generated by kbagent-cicd-migration. Pushes git state to Keboola.\n" + "# Protected by a GitHub Environment so prod pushes require approval.\n" + "name: kbagent push\n" + "on:\n" + " workflow_dispatch:\n" + " inputs:\n" + " allow_delete:\n" + " description: 'Delete remote configs removed locally'\n" + " type: boolean\n" + " default: false\n" + "permissions:\n" + " contents: read\n" + "jobs:\n" + " push:\n" + f" environment: {env_expr}\n" + " runs-on: ubuntu-latest\n" + " steps:\n" + " - uses: actions/checkout@v4\n" + f"{_install_steps_placeholder()}" + " # `sync push` encrypts #-secrets fail-closed by default. Do NOT add\n" + " # --allow-plaintext-on-encrypt-failure in CI.\n" + f"{steps}" + ) + + +# Install steps are injected after generation so the version/ref is applied once. +_INSTALL_TOKEN = "@@INSTALL@@\n" + + +def _install_steps_placeholder() -> str: + return _INSTALL_TOKEN + + +# --------------------------------------------------------------------------- # +# Secrets / variables checklist +# --------------------------------------------------------------------------- # + + +def secrets_report(projects: list[Project], repo_slug: str) -> str: + lines: list[str] = [] + lines.append("Required GitHub secrets (per project Storage API token):") + for p in projects: + lines.append( + f" gh secret set {p.token_secret} " + f"--repo {repo_slug} # project {p.project_id} ({p.directory})" + ) + lines.append("") + lines.append("Required GitHub Environments (for `kbagent push` approval gating):") + lines.append(f" gh api -X PUT repos/{repo_slug}/environments/prod") + lines.append(f" gh api -X PUT repos/{repo_slug}/environments/dev") + lines.append( + " # Then scope each KBC_TOKEN_* secret to its environment and add " + "required reviewers to 'prod' in the GitHub UI." + ) + lines.append("") + lines.append( + "No KBC_STORAGE_API_URL secret needed: it is baked from each manifest's " + "apiHost. Override per project by editing the generated env: block." + ) + return "\n".join(lines) + + +# --------------------------------------------------------------------------- # +# Orchestration +# --------------------------------------------------------------------------- # + + +def _guess_repo_slug(repo: Path) -> str: + config = repo / ".git" / "config" + if config.exists(): + m = re.search(r"github\.com[:/]([^/]+/[^/\s.]+)", config.read_text(errors="ignore")) + if m: + return m.group(1) + return "/" + + +def run(args: argparse.Namespace) -> int: + repo = Path(args.repo_dir).resolve() + if not repo.is_dir(): + print(f"error: {repo} is not a directory", file=sys.stderr) + return 2 + + projects = discover_projects(repo) + if not projects: + print( + f"error: no .keboola/manifest.json found under {repo}. " + "Is this a kbc project-as-code repo?", + file=sys.stderr, + ) + return 2 + + print(f"Discovered {len(projects)} project(s) in {repo}:") + for p in projects: + subset = "" + if p.ignored_components: + subset = f" [subset: {len(p.ignored_components)} ignored component(s)]" + print( + f" - {p.directory:<8} id={p.project_id:<8} host={p.api_host}" + f" token=secrets.{p.token_secret}{subset}" + ) + + legacy = detect_legacy_ci(repo) + print(f"\nLegacy kbc CI/CD files detected ({len(legacy)}):") + for f in legacy: + print(f" - {f}") + if legacy: + print( + " NOTE: these are NOT deleted. Review the new workflows, then remove " + "the legacy ones in the same PR." + ) + + install = _install_steps(args.version, args.git_ref) + files = { + ".github/workflows/kbagent-validate.yml": gen_validate(projects, args.main_branch), + ".github/workflows/kbagent-pull.yml": gen_pull(projects, args.main_branch, args.schedule), + ".github/workflows/kbagent-push.yml": gen_push( + projects, args.main_branch, args.git_branching + ), + } + files = {k: v.replace(_INSTALL_TOKEN, install) for k, v in files.items()} + + print(f"\nGenerated workflows ({'WRITING' if args.write else 'dry-run, use --write'}):") + for rel, content in files.items(): + target = repo / rel + print(f" - {rel} ({len(content.splitlines())} lines)") + if args.write: + target.parent.mkdir(parents=True, exist_ok=True) + target.write_text(content, encoding="utf-8") + + print("\n" + "=" * 70) + print(secrets_report(projects, _guess_repo_slug(repo))) + print("=" * 70) + + if not args.write: + print("\nDry-run only. Re-run with --write to create the files above.") + return 0 + + +def main(argv: list[str] | None = None) -> int: + ap = argparse.ArgumentParser( + description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter + ) + ap.add_argument("repo_dir", help="Path to the kbc project-as-code repo to migrate") + ap.add_argument( + "--write", action="store_true", help="Write the generated workflows (default: dry-run)" + ) + grp = ap.add_mutually_exclusive_group() + grp.add_argument("--version", help="Pin kbagent to this PyPI version, e.g. 0.58.0") + grp.add_argument("--git-ref", help="Pin kbagent to a git tag/ref, e.g. v0.58.0 (no PyPI yet)") + ap.add_argument( + "--main-branch", + default="main", + help="Branch that maps to the prod environment (default: main)", + ) + ap.add_argument( + "--schedule", default=None, help="Cron for scheduled pull, e.g. '0 * * * *' (default: none)" + ) + ap.add_argument( + "--git-branching", action="store_true", help="Annotate for git-branch->Keboola-branch mode" + ) + return run(ap.parse_args(argv)) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/tests/test_sync_cli_behavior.py b/tests/test_sync_cli_behavior.py new file mode 100644 index 00000000..2beed17d --- /dev/null +++ b/tests/test_sync_cli_behavior.py @@ -0,0 +1,145 @@ +"""Behavioral CLI tests for the sync command surface. + +These tests pin the *command behaviors* that make sync safe and that distinguish +the kbagent orchestrator model from kbc's cwd-per-folder model: + + 1. ``sync pull/diff/push`` REQUIRE ``--project ALIAS`` or ``--all-projects`` + (kbagent resolves projects from a central config store, it does not act on + the current directory implicitly). Missing/contradictory selection is a + usage error (exit 2) and must NOT reach the service. + 2. ``--project`` and ``--all-projects`` are mutually exclusive. + 3. ``--branch`` is per-project, so it cannot combine with ``--all-projects``. + 4. ``sync push --dry-run`` must call the service in dry-run mode and never + perform a real write. + +Background: a side-by-side comparison against kbc showed that pulls round-trip to +zero drift and that push is last-write-wins; these guards are the first line of +defense against an accidental wrong-target or whole-tree operation. +""" + +from pathlib import Path +from unittest.mock import MagicMock, patch + +from typer.testing import CliRunner + +from keboola_agent_cli.cli import app +from keboola_agent_cli.config_store import ConfigStore +from keboola_agent_cli.models import ProjectConfig +from keboola_agent_cli.services.project_service import ProjectService + +TEST_TOKEN = "901-55555-fakeTestTokenDoNotUseXXXXXXXX" +runner = CliRunner() + + +def _store(config_dir: Path) -> ConfigStore: + config_dir.mkdir(parents=True, exist_ok=True) + store = ConfigStore(config_dir=config_dir) + store.add_project( + "prod", + ProjectConfig( + stack_url="https://connection.keboola.com", + token=TEST_TOKEN, + project_name="prod", + project_id=1234, + ), + ) + return store + + +def _invoke(args: list[str], tmp_path: Path) -> tuple[int, MagicMock]: + """Invoke the CLI with a mocked SyncService; return (exit_code, mock).""" + store = _store(tmp_path / "config") + mock_sync = MagicMock() + with ( + patch("keboola_agent_cli.cli.ConfigStore") as MockStore, + patch("keboola_agent_cli.cli.ProjectService") as MockProj, + patch("keboola_agent_cli.cli.SyncService") as MockSync, + ): + MockStore.return_value = store + MockProj.return_value = ProjectService(config_store=store) + MockSync.return_value = mock_sync + result = runner.invoke(app, args) + return result.exit_code, mock_sync + + +class TestProjectSelectionRequired: + """pull/diff/push must be told WHICH project(s) — no implicit cwd target.""" + + def test_pull_without_project_is_usage_error(self, tmp_path: Path) -> None: + code, mock = _invoke(["sync", "pull", "--directory", str(tmp_path)], tmp_path) + assert code == 2 + mock.pull.assert_not_called() + mock.pull_all.assert_not_called() + + def test_diff_without_project_is_usage_error(self, tmp_path: Path) -> None: + code, mock = _invoke(["sync", "diff", "--directory", str(tmp_path)], tmp_path) + assert code == 2 + mock.diff.assert_not_called() + + def test_push_without_project_is_usage_error(self, tmp_path: Path) -> None: + code, mock = _invoke(["sync", "push", "--directory", str(tmp_path)], tmp_path) + assert code == 2 + mock.push.assert_not_called() + + +class TestMutuallyExclusiveSelection: + """--project and --all-projects cannot be combined; --branch is per-project.""" + + def test_pull_project_and_all_projects_conflict(self, tmp_path: Path) -> None: + code, mock = _invoke( + ["sync", "pull", "--project", "prod", "--all-projects", "--directory", str(tmp_path)], + tmp_path, + ) + assert code == 2 + mock.pull.assert_not_called() + mock.pull_all.assert_not_called() + + def test_pull_branch_with_all_projects_conflict(self, tmp_path: Path) -> None: + code, mock = _invoke( + ["sync", "pull", "--all-projects", "--branch", "555", "--directory", str(tmp_path)], + tmp_path, + ) + assert code == 2 + mock.pull_all.assert_not_called() + + +class TestPushDryRunIsSafe: + """push --dry-run must run in dry-run mode and never write.""" + + def test_push_dry_run_passes_flag_and_does_not_error(self, tmp_path: Path) -> None: + store = _store(tmp_path / "config") + mock_sync = MagicMock() + # JSON mode avoids the human formatter's field expectations; the point of + # this test is that --dry-run reaches the service, not the print layout. + mock_sync.push.return_value = { + "status": "dry_run", + "project_alias": "prod", + "summary": {"to_create": 0, "to_update": 1, "to_delete": 0}, + "changes": [], + } + with ( + patch("keboola_agent_cli.cli.ConfigStore") as MockStore, + patch("keboola_agent_cli.cli.ProjectService") as MockProj, + patch("keboola_agent_cli.cli.SyncService") as MockSync, + ): + MockStore.return_value = store + MockProj.return_value = ProjectService(config_store=store) + MockSync.return_value = mock_sync + result = runner.invoke( + app, + [ + "--json", + "sync", + "push", + "--project", + "prod", + "--dry-run", + "--directory", + str(tmp_path), + ], + ) + assert result.exit_code == 0, result.output + assert mock_sync.push.call_count == 1 + # The dry_run flag must be propagated to the service (no real write). + _, kwargs = mock_sync.push.call_args + assert kwargs.get("dry_run") is True