feat: durability hardening — externalize model config, add eval net, single-source the system prompt#2
Open
Cartooli wants to merge 4 commits into
Open
feat: durability hardening — externalize model config, add eval net, single-source the system prompt#2Cartooli wants to merge 4 commits into
Cartooli wants to merge 4 commits into
Conversation
…tings.json Model params were hardcoded in ai-tutor.sh business logic, making model swaps require code edits in two places with no fail-safe. Read them from the already-loaded config file with validated ranges and safe defaults equal to the historical values; missing/malformed config falls back cleanly. Ship teacher-settings.example.json and update PROMPT-BUILDER.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The system prompt was triplicated across ai-tutor.sh, CLAUDE.md, and PROMPT-BUILDER.md, and had already drifted (different banned-category lists). Move it to skills/safety/system-prompt.txt as the single source, load it fail-closed in ai-tutor.sh (missing/empty -> no API call), and add safety-check.sh check 9 verifying it covers every CLAUDE.md banned category. Docs now reference the file instead of restating it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
No tests existed for the safety pipeline, so a model swap or a weakened filter would go undetected (the 'silence test'). Add tests/ covering content-filter.sh (every injection pattern + every blocklist category + PII + clean cases, 36 assertions) and input-sanitizer.sh (6 assertions), plus an API-key-gated tutor golden-set that runs live only when a key is present and is skipped (not failed) in CI. Wire run-tests.sh into the safety-check workflow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
EVASION_FILE was defined but never read; the leet-speak/spacing normalization is intentionally hardcoded in normalize(). Replace the dead var with a clarifying comment, mark evasion-patterns.txt as documentation only, and add leet-speak regression tests so the normalization is covered. Mark plan completed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses the three feasible-to-fix issues from the durability review (score 30/56, HIGH RISK), targeting ~40/56 MODERATE — without touching the deterministic safety pipeline or stateless design that already scored well.
The shape was "airtight execution layer, no instruments, hardwired engine." This PR builds the instruments and unbolts the engine:
feat(config)) —model,temperature,max_tokenswere hardcoded inai-tutor.shbusiness logic. Now read from the already-loadedconfig/teacher-settings.jsonwith validated ranges and safe defaults equal to the historical values. Missing/malformed config falls back cleanly. Shipsconfig/teacher-settings.example.json. Fixes the swap test — change a model in one config value.feat(tests)) — there were no tests. Addedtests/covering every injection pattern, every blocklist category, PII, leet-speak evasion, and clean cases forcontent-filter.sh(39 assertions) +input-sanitizer.sh(6), wired into CI. Plus an API-key-gated tutor golden-set that runs live locally and is skipped (not failed) in CI. Fixes the silence test — a weakened filter or bad model swap now breaks CI.feat(safety)) — the prompt was triplicated acrossai-tutor.sh,CLAUDE.md, andPROMPT-BUILDER.mdand had already drifted (different banned-category lists). Moved toskills/safety/system-prompt.txt, loaded fail-closed (missing/empty → no API call), with a newsafety-check.shcheck 9 that verifies coverage of everyCLAUDE.mdbanned category. Fixes fragment coherence.chore(safety)) — removed the deadEVASION_FILEvariable; documented that evasion normalization is intentionally in-code and now test-covered.Key decisions
python3.Testing
bash tests/run-tests.sh→ 45 deterministic assertions pass; golden-set skips without a key../scripts/safety-check.sh→ all 9 checks pass.bash -nclean on all scripts; workflow YAML validates.Post-Deploy Monitoring & Validation
logs/audit-*.logfor new error codesconfig_parse_failed,prompt_file_missing,prompt_file_empty.bash ./tests/run-tests.sh(offline)./scripts/safety-check.shai_enabled: true:bash ./tests/test-tutor-golden.shbefore/after any model change.prompt_file_missingin audit logs (prompt file not deployed) → tutor fails closed to the friendly fallback; restoreskills/safety/system-prompt.txt. Revert the branch if CI tests regress.🤖 Generated with Claude Opus 4.8 (1M context) via Claude Code