Skip to content

fix(teardown-bclaw): fix 4 bugs and drop the false-denial simulate gate#1

Merged
capotej merged 1 commit into
mainfrom
fix/teardown-bclaw-bugs
Jun 29, 2026
Merged

fix(teardown-bclaw): fix 4 bugs and drop the false-denial simulate gate#1
capotej merged 1 commit into
mainfrom
fix/teardown-bclaw-bugs

Conversation

@capotej

@capotej capotej commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes 4 bugs found by running the teardown-bclaw skill against a live bclaw
deploy on ECS Fargate (us-east-1), and removes the simulate-principal-policy
pre-flight gate (it returned false denials in both teardown runs).

  • Phase 1 — waiter hangs ~10min. aws ecs wait services-inactive waits for
    status INACTIVE, which only transitions on DeleteService (Phase 2), not on
    scale-to-0. Replaced with a runningCount poll loop matching the phase's
    actual "0 running tasks" intent.
  • Phase 2 + 5 — EFS query never expanded ${CLAW_NAME} (2 sites). The tag
    filter was inside single quotes so bash never expanded it; the literal
    ${CLAW_NAME}-data was searched and the query always returned empty. This
    broke the Phase 2 EFS capture and made the Phase 5 "efs: gone" check always
    report gone (falsely reassuring). Both sites fixed.
  • Phase 4 — bogus flag + hardcoded provider key. Dropped
    --force-delete-without-recovery (a secretsmanager flag, not
    ssm delete-parameter, which is already immediate/irreversible). Replaced the
    hardcoded 6-name loop (assumed OpenRouter, orphaning ANTHROPIC/ZAI keys) with a
    describe-parameters + delete-parameter loop over all of /bclaw/ so
    whichever provider key (OPENROUTER|ANTHROPIC|ZAI) was used is always deleted.
  • Phase 0/4 — wrong SSM permission claims. The deployer policy's
    SSMSecrets statement grants ssm:DeleteParameter on
    arn:aws:ssm:*:*:parameter/bclaw/* (and SSMDescribe grants
    ssm:DescribeParameters on *), so Phase 4 deletes succeed directly.
    Corrected the "zero SSM actions / console fallback" claims that were false
    against the template's own bclaw-deploy-policy.json.
  • Phase 0 — removed simulate-principal-policy entirely. It returned false
    denials in both teardown runs (ARN-scoped EC2/ECS actions over-report at *;
    tag-conditioned EFS actions under-report without --context-entries).
    Replaced with a lightweight get-caller-identity check; teardown is
    self-correcting and Phase 5 verifies any leftovers.

The Phase 2 delete-stack / FORCE_DELETE_STACK mount-target behavior noted
as out-of-scope is already handled by the existing guidance — no change there.

Test plan

  • pnpm build (tsc strict) clean
  • pnpm lint (biome) clean
  • pnpm test — 11/11 golden tests pass (rename invariants hold; template
    stays internally consistent and rename-safe)
  • shellcheck clean on all bash blocks (the one finding is the pre-existing
    CLAW_NAME=<user-provided> doc placeholder, unchanged from before)
  • jj diff reviewed — only the intended regions changed

@capotej capotej force-pushed the fix/teardown-bclaw-bugs branch 5 times, most recently from e9cd38f to 4bf039d Compare June 28, 2026 23:59
…nces out of skills

teardown-bclaw (4 bug fixes from live-deploy findings):
- Phase 1: replace `ecs wait services-inactive` (waits for INACTIVE status,
  only set by DeleteService — never on scale-to-0, so it timed out ~10min)
  with a runningCount poll loop matching the phase's "0 running tasks" intent.
- Phase 2 + Phase 5: fix the EFS JMESPath tag filter. `${CLAW_NAME}` was inside
  single quotes so bash never expanded it — the literal `${CLAW_NAME}-data` was
  searched and the query always returned empty, breaking the Phase 2 EFS
  capture and the Phase 5 "efs: gone" check.
- Phase 4: drop the bogus `--force-delete-without-recovery` flag (that belongs
  to secrets-manager, not `ssm delete-parameter`, which is already immediate);
  replace the hardcoded 6-name loop (assumed OpenRouter) with a
  describe+delete of every param under /bclaw/ so whichever provider key
  (OPENROUTER|ANTHROPIC|ZAI) was used is always deleted.
- Phase 4 note + Phase 0: correct the SSM permission story. The deployer
  policy's SSMSecrets statement grants ssm:DeleteParameter on
  parameter/bclaw/*, so Phase 4 deletes succeed directly — the old "zero SSM
  actions / console fallback" claims were wrong against the template's own
  policy.

teardown-bclaw Phase 0:
- Remove the simulate-principal-policy pre-flight entirely. It returned false
  denials in both teardown runs (ARN-scoped actions over-report at *,
  tag-conditioned EFS actions under-report without --context-entries).
  Replaced with a lightweight get-caller-identity check; teardown is
  self-correcting and Phase 5 verifies any leftovers.

teardown-bclaw Phase 2:
- Generalize the DELETE_FAILED guidance. The force-delete remedy was
  documented but framed EFS-only; a second teardown showed IAM roles
  (bclaw-*) also hit DELETE_FAILED via NoSuchEntity when already gone. Now
  documents the class (CloudFormation can't confirm deletion of already-gone
  resources) with both observed manifestations (EFS mount targets 403, IAM
  roles NoSuchEntity) routing to the same --deletion-mode FORCE_DELETE_STACK fix.

setup-bclaw (correctness fixes):
- Notes: the "Stack updates reset DesiredCount to 0 / future template
  improvement: make DesiredCount a parameter" bullet was factually stale —
  DesiredCount IS already a !Ref parameter with default 1 (enforced by
  validate-template.py). Rewrote to describe the real behavior: un-passed
  params revert to template defaults on stack update, re-pass the live count.
- Notes: removed all 7 citations to references/ (the moved author docs).

Move author references out of the shipped skills:
- The three setup-bclaw/references/ docs (template-pitfalls.md,
  iam-and-template-lessons.md, on-boot-commands.md) are background for
  AUTHORS of this repo (the policy and skills), not instructions for running
  setup/teardown. Moved to the repo-root references/ (not shipped in the
  generated claw) and removed all citations to references/ from the skills
  (setup-bclaw/SKILL.md x7, template.yaml, validate-template.py x2).
- agent_home/AGENTHOME.md: drop the stale "+ references/" from the skills/
  layout comment (no shipped skill has a references/ dir anymore).
- teardown-bclaw had no such citations.

AGENTS.md:
- Add a "## Skills" section codifying the convention: shipped skill content
  (template/.agents/skills/**) must be factual and present-tense — no
  historical/changelog narrative, no war-story framing; discovery narratives,
  migration histories, and lessons-learned belong in the repo-level
  references/ directory, and skills must not cite references/.

Verified: 11/11 golden tests pass (rename invariants hold), biome clean,
validate-template.py runs, shellcheck clean on teardown bash blocks.
@capotej capotej force-pushed the fix/teardown-bclaw-bugs branch from 4bf039d to 8379d6a Compare June 29, 2026 00:00
@capotej capotej merged commit 9dc26e8 into main Jun 29, 2026
1 check passed
@capotej capotej deleted the fix/teardown-bclaw-bugs branch June 29, 2026 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant