Skip to content

feat: durability hardening — externalize model config, add eval net, single-source the system prompt#2

Open
Cartooli wants to merge 4 commits into
mainfrom
feat/durability-hardening
Open

feat: durability hardening — externalize model config, add eval net, single-source the system prompt#2
Cartooli wants to merge 4 commits into
mainfrom
feat/durability-hardening

Conversation

@Cartooli

@Cartooli Cartooli commented Jun 5, 2026

Copy link
Copy Markdown
Owner

Summary

Addresses the three feasible-to-fix issues from the durability review (score 30/56, HIGH RISK), targeting ~40/56 MODERATE — without touching the deterministic safety pipeline or stateless design that already scored well.

The shape was "airtight execution layer, no instruments, hardwired engine." This PR builds the instruments and unbolts the engine:

  • Externalize model config (feat(config)) — model, temperature, max_tokens were hardcoded in ai-tutor.sh business logic. Now read from the already-loaded config/teacher-settings.json with validated ranges and safe defaults equal to the historical values. Missing/malformed config falls back cleanly. Ships config/teacher-settings.example.json. Fixes the swap test — change a model in one config value.
  • Eval/regression harness (feat(tests)) — there were no tests. Added tests/ covering every injection pattern, every blocklist category, PII, leet-speak evasion, and clean cases for content-filter.sh (39 assertions) + input-sanitizer.sh (6), wired into CI. Plus an API-key-gated tutor golden-set that runs live locally and is skipped (not failed) in CI. Fixes the silence test — a weakened filter or bad model swap now breaks CI.
  • Single-source the system prompt (feat(safety)) — the prompt was triplicated across ai-tutor.sh, CLAUDE.md, and PROMPT-BUILDER.md and had already drifted (different banned-category lists). Moved to skills/safety/system-prompt.txt, loaded fail-closed (missing/empty → no API call), with a new safety-check.sh check 9 that verifies coverage of every CLAUDE.md banned category. Fixes fragment coherence.
  • Cleanup (chore(safety)) — removed the dead EVASION_FILE variable; documented that evasion normalization is intentionally in-code and now test-covered.

Key decisions

  • All new external dependencies (config values, prompt file) degrade in the safe direction.
  • No new runtime dependencies — pure bash + existing python3.
  • With no config file present (the current real state), runtime behavior is unchanged.

Testing

  • bash tests/run-tests.sh → 45 deterministic assertions pass; golden-set skips without a key.
  • ./scripts/safety-check.sh → all 9 checks pass.
  • bash -n clean on all scripts; workflow YAML validates.
  • Config parsing verified: valid overrides applied, out-of-range values fall back to defaults.

Post-Deploy Monitoring & Validation

  • What to monitor/search
    • Logs: logs/audit-*.log for new error codes config_parse_failed, prompt_file_missing, prompt_file_empty.
    • CI: the new "Run tests" step in the EduStack Safety Check workflow.
  • Validation checks (commands)
    • bash ./tests/run-tests.sh (offline)
    • ./scripts/safety-check.sh
    • With a key + ai_enabled: true: bash ./tests/test-tutor-golden.sh before/after any model change.
  • Expected healthy behavior: tutor responses unchanged with no config file; model swap via config alone passes the golden-set.
  • Failure signal / rollback trigger: prompt_file_missing in audit logs (prompt file not deployed) → tutor fails closed to the friendly fallback; restore skills/safety/system-prompt.txt. Revert the branch if CI tests regress.
  • Validation window & owner: first classroom session after deploy; repo owner.

🤖 Generated with Claude Opus 4.8 (1M context) via Claude Code

Cartooli and others added 4 commits June 5, 2026 09:58
…tings.json

Model params were hardcoded in ai-tutor.sh business logic, making model
swaps require code edits in two places with no fail-safe. Read them from
the already-loaded config file with validated ranges and safe defaults
equal to the historical values; missing/malformed config falls back
cleanly. Ship teacher-settings.example.json and update PROMPT-BUILDER.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The system prompt was triplicated across ai-tutor.sh, CLAUDE.md, and
PROMPT-BUILDER.md, and had already drifted (different banned-category
lists). Move it to skills/safety/system-prompt.txt as the single source,
load it fail-closed in ai-tutor.sh (missing/empty -> no API call), and
add safety-check.sh check 9 verifying it covers every CLAUDE.md banned
category. Docs now reference the file instead of restating it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
No tests existed for the safety pipeline, so a model swap or a weakened
filter would go undetected (the 'silence test'). Add tests/ covering
content-filter.sh (every injection pattern + every blocklist category +
PII + clean cases, 36 assertions) and input-sanitizer.sh (6 assertions),
plus an API-key-gated tutor golden-set that runs live only when a key is
present and is skipped (not failed) in CI. Wire run-tests.sh into the
safety-check workflow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
EVASION_FILE was defined but never read; the leet-speak/spacing
normalization is intentionally hardcoded in normalize(). Replace the dead
var with a clarifying comment, mark evasion-patterns.txt as documentation
only, and add leet-speak regression tests so the normalization is covered.
Mark plan completed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant