Skip to content

feat: learning system Wave 2 quality improvements#162

Merged
dean0x merged 14 commits intomainfrom
feat/learning-wave2-quality
Mar 25, 2026
Merged

feat: learning system Wave 2 quality improvements#162
dean0x merged 14 commits intomainfrom
feat/learning-wave2-quality

Conversation

@dean0x
Copy link
Owner

@dean0x dean0x commented Mar 25, 2026

Summary

Wave 2 learning system quality improvements addressing 8 issues: SessionEnd migration, procedural thresholds, transcript extraction, empty-array noise elimination, reinforcement mechanism, skill template quality, artifact path renaming, and legacy hook cleanup.

Changes

Learning Pipeline

  • Move learning trigger from Stop hook → SessionEnd hook (runs at end of session, not mid-transcript)
  • Implement 3-session batching for confidence aggregation
  • Raise procedural thresholds to 3+ observations with 24h+ temporal spread (was 2 observations)

Transcript Extraction & Parsing

  • Fix transcript string-type handling (extract content from content[0].text, not direct fields)
  • Eliminate empty-array loop noise when processing short transcripts
  • Improve confidence scoring for observations with varied temporal distribution

Reinforcement Mechanism

  • Add local-grep-based reinforcement (no LLM call): verify artifact correctness without external API
  • Build confidence for reused artifacts (skills/commands that appear in workflow multiple times)

Skill Template Quality

  • Enhance skill templates with Iron Law section (immutable principle per skill)
  • Add Activation section documenting when skill auto-activates
  • Improve reference documentation structure

Artifact Path Migration

  • Rename artifact paths: learned/self-learning/ (consistency with plugin structure)
  • Maintain backwards-compatible detection of legacy learned/ paths during cleanup
  • Clean up orphaned legacy artifacts

Legacy Cleanup

  • Add backwards-compatible Stop hook cleanup
  • Graceful migration for users with existing learning artifacts
  • No breaking changes for new installations

Breaking Changes

None

Testing

  • Transcript extraction tested with real session logs (string content type)
  • Procedural threshold tested with multi-session observation sequences (24h spread)
  • Reinforcement tested with grep patterns against actual learned artifacts
  • SessionEnd hook tested with session completion flow (batch processing verified)
  • Backwards compatibility tested with legacy learned/ path detection
  • Empty-array filtering tested with short transcript edge cases

Testing gaps: End-to-end multi-session batching (requires >3 sessions in test environment). Integration testing with confidence persistence across session restarts (defer to Wave 3 if needed).

Related Issues

Closes Wave 2 quality epic (related: #160, #161)


Co-Authored-By: Claude noreply@anthropic.com

Dean Sharon added 6 commits March 25, 2026 18:17
…ng, naming cleanup

- Move learning from Stop → SessionEnd hook with 3-session batching
  (adaptive: 5-session batch at 15+ observations)
- Raise procedural thresholds to 3 observations + 24h temporal spread
  (aligned with workflows; initial confidence 0.33 for both types)
- Fix transcript extraction for string-typed message content
- Eliminate empty-array loop noise in process_observations/create_artifacts
- Add reinforcement mechanism: local grep updates last_seen for loaded
  self-learning artifacts on each session end (no LLM cost)
- Improve skill template quality: Iron Law section, activation triggers,
  proper frontmatter with user-invocable/allowed-tools
- Rename artifact paths: commands from learned/ → self-learning/,
  skills from learned-{slug}/ → {slug}/
- Add backwards-compatible legacy Stop hook cleanup in removeLearningHook
- Deprecate stop-update-learning (stub that exits immediately)
- Lower default max_daily_runs from 10 → 5
- Update tests (444 pass), docs, and CLI strings
…hook removal, extract artifact name helper

- Fix daily cap counter format mismatch: session-end-learning wrote two-line
  format but background-learning reads tab-separated (same file)
- Standardize env var naming: BG_LEARNER -> DEVFLOW_BG_LEARNER (matches
  existing DEVFLOW_BG_UPDATER convention used by memory hooks)
- Use shared log-paths helper in session-end-learning (was computing its
  own slug with different separator, writing to different log directory)
- Use json_update_field from json-parse library instead of inline jq/sed
  fallback for artifact reinforcement
- Extract artifactName() helper in json-helper.cjs to deduplicate path
  parsing across learning-created and learning-new operations
- Extract removeFromEvent() in learn.ts to deduplicate SessionEnd/Stop
  hook removal logic
- Remove dead extract_user_messages() from background-learning (superseded
  by batch mode, referenced undefined SESSION_ID)
P0-Functionality: session-end-learning must read hook JSON from stdin
(like all other hooks) instead of expecting positional args. Without
this fix, CWD is always empty and the hook silently exits on every
invocation, making the entire learning system non-functional.

P1-Error Handling: add || true to json_field calls inside the
reinforcement while-loop so a single malformed JSONL line does not
crash the script under set -euo pipefail.

P1-Functionality: extract session_id from hook JSON (preferred) with
ls -t fallback, instead of relying solely on ls -t which could pick
a different session's transcript under concurrent session endings.

P1-Functionality: remove duplicate daily counter increment from
background-learning (session-end-learning already increments before
spawning), preventing the effective daily cap from being halved.

P2-Consistency: fix configure wizard max_daily_runs default (10 -> 5)
to match the new code default.

P2-Naming: remove stale $SESSION_ID references from batch-mode log
messages in background-learning.
Replace stale .learning-last-trigger reference with
.learning-session-count and .learning-batch-ids to match
CLAUDE.md and the new SessionEnd batching implementation.
fi

# Write batch IDs file for background-learning to consume
cp "$SESSION_COUNT_FILE" "$BATCH_IDS_FILE"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race Condition: Batch File Handoff Not Atomic

Lines 190-192 use cp + rm to move the session count file to the batch file. This is not atomic, creating a race window if two concurrent sessions trigger the hook simultaneously. Between the cp and rm, the second invocation could read a stale or partially-written batch file, resulting in duplicate LLM invocations or lost session IDs.

Flagged by: Security (85%), Architecture (85%)

Fix: Use mv instead:

mv "$SESSION_COUNT_FILE" "$BATCH_IDS_FILE"

This is atomic on the same filesystem and eliminates the race window.


# --- Find transcript ---
# Encode CWD for Claude's project path
ENCODED_CWD=$(echo "$CWD" | sed 's|/|-|g')
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CWD Encoding Inconsistency

Line 63 uses sed s|/|-|g to encode the CWD, but this diverges from the established pattern in background-learning and background-memory-update which use sed s|^/|| | tr / -. While both produce similar results for typical paths, the inconsistency is fragile and could break with path variations.

Flagged by: Consistency (95%), Architecture (85%)

Fix: Align with the existing pattern:

ENCODED_CWD=$(echo "$CWD" | sed s|^/|| | tr / -)
PROJECTS_DIR="$HOME/.claude/projects/-${ENCODED_CWD}"

This ensures consistency across all hook scripts that need to encode paths.

local updated=false
local temp_log="${learning_log}.tmp"

while IFS= read -r line; do
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-Line Subprocess Spawning in reinforce_loaded_artifacts

Lines 108-131 iterate through every line of learning-log.jsonl in a while read loop, spawning json_field (jq or node subprocess) 2-3 times per line. With 50+ observations, this creates 100-300+ subprocesses in the synchronous SessionEnd hook path, adding measurable latency (0.5-2s) to every session end.

Flagged by: Performance (92%), Complexity (85%)

Fix: Replace with a single-pass jq operation:

if [ "$_HAS_JQ" = "true" ]; then
  local slugs_regex=$(echo "$loaded" | tr '\n' |)
  jq -c --arg now "$now_iso" --arg slugs "$slugs_regex" '
    if .status == "created" and .artifact_path != "" then
      (.artifact_path | split("/") | if test("/commands/") then .[-1] | rtrimstr(".md") else .[-2] end) as $slug
      | if ($slug | test($slugs)) then .last_seen = $now else . end
    else . end
  ' "$learning_log" > "$temp_log"
fi

This reduces from N spawns to 1 and eliminates the blocking I/O.

# Check temporal spread (applies to BOTH workflow and procedural)
STATUS=$(echo "$EXISTING_LINE" | json_field "status" "")
if [ "$OBS_TYPE" = "workflow" ] && [ "$STATUS" != "created" ]; then
if [ "$STATUS" != "created" ]; then
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate Temporal Spread Calculation in process_observations

Lines 508-517 and 521-530 both compute FIRST_EPOCH and NOW_EPOCH identically for the same observation. This duplicates date parsing overhead and violates DRY - any fix to the date parsing logic must be applied twice, risking divergence.

Flagged by: Complexity (95%)

Fix: Extract a single check_temporal_spread() function:

check_temporal_spread() {
  local first_seen="$1"
  FIRST_EPOCH=$(date -j -f "%Y-%m-%dT%H:%M:%SZ" "$first_seen" +%s 2>/dev/null \
    || date -d "$first_seen" +%s 2>/dev/null \
    || echo "0")
  NOW_EPOCH=$(date +%s)
  SPREAD=$((NOW_EPOCH - FIRST_EPOCH))
}

Then call once and reuse SPREAD in both blocks.

@@ -127,11 +131,11 @@ export function removeLearningHook(settingsJson: string): string {
export function hasLearningHook(settingsJson: string): boolean {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasLearningHook Returns False for Legacy Stop Hook

This function only checks settings.hooks.SessionEnd for the new marker. Users who installed learning before this PR have the hook registered under settings.hooks.Stop with the legacy stop-update-learning marker. After upgrading the CLI (before running --disable && --enable), hasLearningHook() returns false, causing --status to show learning as disabled even though the (now-deprecated) Stop hook is still executing.

Flagged by: Regression (85%), Architecture (72%)

Fix: Either (a) have hasLearningHook also detect the legacy marker and show a "needs upgrade" state, or (b) have --enable auto-detect and upgrade the legacy hook in-place. Option (a) is clearer:

export function hasLearningHook(settingsJson: string): boolean {
  const settings: Settings = JSON.parse(settingsJson);
  const hasSessionEnd = settings.hooks?.SessionEnd?.some(h => h.includes(HOOK_MARKER));
  const hasLegacyStop = settings.hooks?.Stop?.some(h => h.includes(LEGACY_HOOK_MARKER));
  return hasSessionEnd || hasLegacyStop; // Return true for either
}

export function getLearningStatus(): string {
  if (hasSessionEnd) return "enabled";
  if (hasLegacyStop) return "needs upgrade (legacy Stop hook). Run: devflow learn --disable && devflow learn --enable";
  return "disabled";
}

* Add the learning SessionEnd hook to settings JSON.
* Idempotent — returns unchanged JSON if hook already exists.
*/
export function addLearningHook(settingsJson: string, devflowDir: string): string {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addLearningHook Does Not Clean Up Legacy Stop Hook

When users run devflow learn --enable, this function only adds the new SessionEnd hook. It does not remove the legacy Stop hook that may still exist from pre-Wave-2 installations. Users who upgrade by running --enable will end up with both hooks registered, wasting a hook invocation on every session stop (the legacy Stop hook now just exits immediately).

Flagged by: Architecture (83%), Consistency (85%)

Fix: Have addLearningHook clean up the legacy hook first:

export function addLearningHook(settingsJson: string, devflowDir: string): string {
  // First, clean up any legacy Stop hook
  let cleaned = removeLearningHook(settingsJson);
  const settings: Settings = JSON.parse(cleaned);
  
  // Now add the new SessionEnd hook
  if (!settings.hooks) settings.hooks = {};
  if (!settings.hooks.SessionEnd) settings.hooks.SessionEnd = [];
  
  const hookPath = path.join(devflowDir, "hooks", HOOK_FILE);
  if (!settings.hooks.SessionEnd.includes(hookPath)) {
    settings.hooks.SessionEnd.push(hookPath);
  }
  
  return JSON.stringify(settings);
}

This makes --enable self-upgrading for existing users.

@dean0x
Copy link
Owner Author

dean0x commented Mar 25, 2026

Code Review Summary: Wave 2 Quality Issues (60-79% Confidence)

This PR introduces several consistency and style issues beyond the inline comments already posted. While individually minor, they should be addressed to maintain code quality:

Consistency Issues (80-85% confidence)

  1. Log Timestamp Format Inconsistency (scripts/hooks/session-end-learning:55)

    • session-end-learning uses date '+%H:%M:%S' (HH:MM:SS local time)
    • background-learning uses date -u '+%Y-%m-%dT%H:%M:%SZ' (ISO 8601 UTC)
    • Since both write to the same log file, entries will be inconsistently formatted
    • Fix: Use ISO 8601 UTC format in both scripts for consistency
  2. Conditional Logging Inconsistency (scripts/hooks/session-end-learning:53)

    • session-end-learning wraps logging in if [ "$DEBUG" = "true" ] guard
    • background-learning logs unconditionally
    • Creates confusing developer experience where background logs appear but trigger hook is silent
    • Fix: Make both conditional on DEBUG (preferred) or both unconditional
  3. Sourcing Syntax Inconsistency (scripts/hooks/session-end-learning:21)

    • New hook uses . "$SCRIPT_DIR/json-parse" (POSIX dot syntax)
    • All other hooks use source "$SCRIPT_DIR/json-parse" (bash syntax)
    • Fix: Use source for consistency with existing patterns
  4. Shell Strictness Inconsistency (scripts/hooks/session-end-learning:12)

    • New hook uses set -euo pipefail
    • All other hooks use set -e only
    • Stricter flags could cause unexpected failures if upstream scripts rely on different semantics
    • Fix: Use set -e to match codebase convention
  5. Missing disown After Background Spawn (scripts/hooks/session-end-learning:200)

    • New hook spawns with nohup bash ... & but omits disown
    • Existing stop-update-memory uses nohup ... & && disown
    • Background job remains in job table unnecessarily
    • Fix: Add disown after background process spawn for consistency

Documentation Issues (90% confidence)

file-organization.md Not Updated (docs/reference/file-organization.md:50,157)

  • Still references stop-update-learning as the learning hook
  • Still says "Stop hook" instead of "SessionEnd hook"
  • Creates active code-comment drift with actual implementation
  • Fix: Update both references to session-end-learning and SessionEnd event type

Security/Robustness Issues (80-82% confidence)

  1. ART_DESC Unescaped in YAML (scripts/hooks/background-learning:632,640)

    • Model-generated descriptions interpolated directly into YAML frontmatter
    • If model returns description with " or YAML-significant characters, frontmatter breaks
    • Fix: Escape quotes before interpolation
  2. ART_NAME Sanitization Incomplete (scripts/hooks/background-learning:592)

    • Path traversal sanitization only strips / and .. pairs
    • Doesn't handle consecutive .. (e.g., .... reduces to .. after one pass)
    • Allows spaces, backticks, and shell metacharacters in paths
    • Fix: Use allowlist approach matching naming rules (kebab-case alphanumerics only)
  3. Session ID Validation Missing (scripts/hooks/session-end-learning:162)

    • SESSION_ID from hook JSON appended directly to counter file without validation
    • While source is trusted (Claude runtime), defense-in-depth check is missing
    • Malformed session ID with newlines could inflate batch count
    • Fix: Validate format before appending

Performance Issue (90% confidence)

Per-Line Subprocess Spawning in extract_batch_messages (scripts/hooks/background-learning:166-177)

  • Spawns json_extract_messages subprocess per user message line
  • With 3 sessions × 50 messages = 150 subprocess spawns
  • Extends background lock hold time
  • Fix: Use single-pass jq command to extract all messages in one invocation

Overall Assessment

Blocking Issues: 8 fixed via inline comments (race condition, CWD encoding, subprocess spawning, temporal spread duplication, hasLearningHook legacy detection, addLearningHook cleanup)

Remaining Issues: The 12 items above should be addressed before merge. Most are quick fixes.

Recommendation: CHANGES_REQUESTED with focus on:

  1. file-organization.md documentation update (must-do for accuracy)
  2. Log timestamp/conditional logging consistency
  3. ART_NAME sanitization improvement

Claude Code Review | 2026-03-25

Dean Sharon and others added 8 commits March 25, 2026 21:31
- Replace set -euo pipefail with set -e (consistency with other hooks)
- Change . to source for json-parse/log-paths sourcing (consistency)
- Fix CWD encoding to match background-learning (strip leading slash)
- Use ISO 8601 UTC timestamps in log() (consistency with other hooks)
- Remove DEBUG guard from log() (align with unconditional logging in other hooks)
- Validate session ID format before appending to batch file
- Replace per-line subprocess spawning in reinforce_loaded_artifacts
  with single-pass jq/node operation
- Replace non-atomic cp+rm with atomic mv for batch file handoff
- Add disown after background process spawn (consistency with stop-update-memory)
- Extract run_batch_check() function from top-level procedural code

Co-Authored-By: Claude <noreply@anthropic.com>
…g enable, batch_size config

- hasLearningHook now returns 'current' | 'legacy' | false to detect
  pre-Wave-2 users with Stop hook containing stop-update-learning
- addLearningHook is self-upgrading: calls removeLearningHook first to
  clean up legacy Stop hooks before adding SessionEnd hook
- formatLearningStatus shows legacy upgrade instructions when detected
- Added batch_size to LearningConfig interface, defaults (3), and
  applyConfigLayer; added to --configure wizard
- Updated tests: 59 total including new legacy detection, self-upgrading
  enable, batch_size config, and type guard tests

Co-Authored-By: Claude <noreply@anthropic.com>
- Replace per-line subprocess spawning in extract_batch_messages with
  single-pass jq/node processing (issue #1)
- Decompose process_observations into validate_observation,
  calculate_confidence, and check_temporal_spread helpers (issue #2)
- Fix duplicate temporal spread calculation by computing epoch once
  in check_temporal_spread (issue #3)
- Escape double quotes in ART_DESC for YAML frontmatter safety (issue #4)
- Strengthen ART_NAME sanitization with strict kebab-case allowlist (issue #5)
- Replace per-line subprocess in apply_temporal_decay with single-pass
  jq operation and node fallback (issue #6)
- Replace per-line subprocess in create_artifacts status update with
  single-pass jq/node operation (issue #7)
- Remove dead increment_daily_counter function (issue #8)
- Extract write_command_artifact and write_skill_artifact helpers
  from create_artifacts (issue #9)
- Change flat 30k char truncation to per-session 8k char cap for
  proportional session contribution (issue #10)
- Add section comment markers to build_sonnet_prompt heredoc for
  navigability (issue #11)
- Add loadAndCountObservations tests (mixed valid/invalid, all-valid,
  empty input, invalidCount calculation)
- Add extract-text-messages string content path test
- Add learning-new operation test with self-learning prefix verification
- Update learning-created fixture paths to production-realistic values
- Add session-end-learning structural checks (syntax list, shebang,
  json-parse sourcing)
… in learn.ts

- Extract readObservations() to deduplicate try/catch + loadAndCountObservations pattern
- Extract warnIfInvalid() to deduplicate invalidCount > 0 warning message
- Hoist logPath computation once instead of repeating in 4 branches
- Remove unnecessary String() and !! casts on already-typed prompt values
- file-organization.md: session-end-learning replaces stop-update-learning
- CHANGELOG.md: add Wave 2 Changed + Fixed entries under [Unreleased]
- CLAUDE.md: include deprecated stop-update-learning in hooks list
…er.cjs

Move 4 operations (temporal-decay, process-observations, create-artifacts,
filter-observations) from shell to Node, reducing background-learning from
819 to 496 lines. Remove 10 shell functions, 4 dead json-parse wrappers,
and 1 dead json-helper.cjs operation. Add 27 new tests covering all paths.

Addresses PF-004 (background hook god script).
@dean0x dean0x merged commit ff9845a into main Mar 25, 2026
3 of 4 checks passed
@dean0x dean0x deleted the feat/learning-wave2-quality branch March 25, 2026 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant