fix(teardown-bclaw): fix 4 bugs and drop the false-denial simulate gate#1
Merged
Conversation
e9cd38f to
4bf039d
Compare
…nces out of skills
teardown-bclaw (4 bug fixes from live-deploy findings):
- Phase 1: replace `ecs wait services-inactive` (waits for INACTIVE status,
only set by DeleteService — never on scale-to-0, so it timed out ~10min)
with a runningCount poll loop matching the phase's "0 running tasks" intent.
- Phase 2 + Phase 5: fix the EFS JMESPath tag filter. `${CLAW_NAME}` was inside
single quotes so bash never expanded it — the literal `${CLAW_NAME}-data` was
searched and the query always returned empty, breaking the Phase 2 EFS
capture and the Phase 5 "efs: gone" check.
- Phase 4: drop the bogus `--force-delete-without-recovery` flag (that belongs
to secrets-manager, not `ssm delete-parameter`, which is already immediate);
replace the hardcoded 6-name loop (assumed OpenRouter) with a
describe+delete of every param under /bclaw/ so whichever provider key
(OPENROUTER|ANTHROPIC|ZAI) was used is always deleted.
- Phase 4 note + Phase 0: correct the SSM permission story. The deployer
policy's SSMSecrets statement grants ssm:DeleteParameter on
parameter/bclaw/*, so Phase 4 deletes succeed directly — the old "zero SSM
actions / console fallback" claims were wrong against the template's own
policy.
teardown-bclaw Phase 0:
- Remove the simulate-principal-policy pre-flight entirely. It returned false
denials in both teardown runs (ARN-scoped actions over-report at *,
tag-conditioned EFS actions under-report without --context-entries).
Replaced with a lightweight get-caller-identity check; teardown is
self-correcting and Phase 5 verifies any leftovers.
teardown-bclaw Phase 2:
- Generalize the DELETE_FAILED guidance. The force-delete remedy was
documented but framed EFS-only; a second teardown showed IAM roles
(bclaw-*) also hit DELETE_FAILED via NoSuchEntity when already gone. Now
documents the class (CloudFormation can't confirm deletion of already-gone
resources) with both observed manifestations (EFS mount targets 403, IAM
roles NoSuchEntity) routing to the same --deletion-mode FORCE_DELETE_STACK fix.
setup-bclaw (correctness fixes):
- Notes: the "Stack updates reset DesiredCount to 0 / future template
improvement: make DesiredCount a parameter" bullet was factually stale —
DesiredCount IS already a !Ref parameter with default 1 (enforced by
validate-template.py). Rewrote to describe the real behavior: un-passed
params revert to template defaults on stack update, re-pass the live count.
- Notes: removed all 7 citations to references/ (the moved author docs).
Move author references out of the shipped skills:
- The three setup-bclaw/references/ docs (template-pitfalls.md,
iam-and-template-lessons.md, on-boot-commands.md) are background for
AUTHORS of this repo (the policy and skills), not instructions for running
setup/teardown. Moved to the repo-root references/ (not shipped in the
generated claw) and removed all citations to references/ from the skills
(setup-bclaw/SKILL.md x7, template.yaml, validate-template.py x2).
- agent_home/AGENTHOME.md: drop the stale "+ references/" from the skills/
layout comment (no shipped skill has a references/ dir anymore).
- teardown-bclaw had no such citations.
AGENTS.md:
- Add a "## Skills" section codifying the convention: shipped skill content
(template/.agents/skills/**) must be factual and present-tense — no
historical/changelog narrative, no war-story framing; discovery narratives,
migration histories, and lessons-learned belong in the repo-level
references/ directory, and skills must not cite references/.
Verified: 11/11 golden tests pass (rename invariants hold), biome clean,
validate-template.py runs, shellcheck clean on teardown bash blocks.
4bf039d to
8379d6a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes 4 bugs found by running the teardown-bclaw skill against a live bclaw
deploy on ECS Fargate (
us-east-1), and removes thesimulate-principal-policypre-flight gate (it returned false denials in both teardown runs).
aws ecs wait services-inactivewaits forstatus
INACTIVE, which only transitions onDeleteService(Phase 2), not onscale-to-0. Replaced with a
runningCountpoll loop matching the phase'sactual "0 running tasks" intent.
${CLAW_NAME}(2 sites). The tagfilter was inside single quotes so bash never expanded it; the literal
${CLAW_NAME}-datawas searched and the query always returned empty. Thisbroke the Phase 2 EFS capture and made the Phase 5 "efs: gone" check always
report gone (falsely reassuring). Both sites fixed.
--force-delete-without-recovery(asecretsmanagerflag, notssm delete-parameter, which is already immediate/irreversible). Replaced thehardcoded 6-name loop (assumed OpenRouter, orphaning ANTHROPIC/ZAI keys) with a
describe-parameters+delete-parameterloop over all of/bclaw/sowhichever provider key (OPENROUTER|ANTHROPIC|ZAI) was used is always deleted.
SSMSecretsstatement grantsssm:DeleteParameteronarn:aws:ssm:*:*:parameter/bclaw/*(andSSMDescribegrantsssm:DescribeParameterson*), so Phase 4 deletes succeed directly.Corrected the "zero SSM actions / console fallback" claims that were false
against the template's own
bclaw-deploy-policy.json.simulate-principal-policyentirely. It returned falsedenials in both teardown runs (ARN-scoped EC2/ECS actions over-report at
*;tag-conditioned EFS actions under-report without
--context-entries).Replaced with a lightweight
get-caller-identitycheck; teardown isself-correcting and Phase 5 verifies any leftovers.
The Phase 2
delete-stack/FORCE_DELETE_STACKmount-target behavior notedas out-of-scope is already handled by the existing guidance — no change there.
Test plan
pnpm build(tsc strict) cleanpnpm lint(biome) cleanpnpm test— 11/11 golden tests pass (rename invariants hold; templatestays internally consistent and rename-safe)
shellcheckclean on all bash blocks (the one finding is the pre-existingCLAW_NAME=<user-provided>doc placeholder, unchanged from before)jj diffreviewed — only the intended regions changed