🇰🇷 한국어 버전
Consolidated test results from the integrated testing session. Numbers in this document reflect the April 3 focused test. Machine: ubuntu-1 (Linux x86-64), Plan: Max 20 ($200/mo) Proxy: cc-relay v2 (budget detection + session ID support)
Latest data (April 8): Full-week proxy data shows significantly higher event counts — B4: 3,782 (vs 327 here), B5: 72,839 (vs 261 here), B8: 2.37x avg across 532 files (vs 2.87x single session here). See 13_PROXY-DATA.md.
| Component | Version | Path |
|---|---|---|
| npm (Node.js) | v2.1.91 | ~/.nvm/versions/node/v22.22.1/bin/claude |
| Standalone (ELF) | v2.1.91 | ~/.local/bin/claude |
| Proxy | cc-relay v2 | ~/GitHub/cc-relay/ on :8080 |
| DB | SQLite | ~/.cc-relay/usage.db |
Method: Searched all existing JSONL files for <synthetic> model entries.
Result:
- 151
<synthetic>entries across 65 session files - 25 in
acompactsub-agents, 126 in regular sessions - Signature:
model=<synthetic> input=0 output=0 stop=stop_sequence - Text:
"No response requested."
Interpretation: Client-side rate limiter fires WITHOUT calling the API. Confirmed on our setup (v2.1.90). Not a rare edge case — 65 sessions affected.
Method: cc-relay _scan_microcompact() detects [Old tool result content cleared] in outgoing request bodies.
Results (v2.1.90, from existing data):
| Session | Events | Cleared Indices | Msg Range | Tool Results |
|---|---|---|---|---|
| Session 1 (02:29-02:54) | 67 | [20,22,38,40,44,46,162,166,172,174,206] | 121→251 | 53→112 |
| Session 2 (09:23-09:47) | 71 | [2,4,6,8,10,12,58,60,68,70,74,76] | 59→135 | 38→81 |
Key Findings:
- All cleared indices are even-numbered → targets tool_use/tool_result pairs specifically
- Indices expand over time — starts with early messages, progressively includes newer ones
- Main session cache NOT affected — ratio stays 99%+ during clearing (stable substitution preserves prefix)
- Sub-agent cold starts DO show cache drops — 0%, 39%, 82% observed at clearing moments
- All GrowthBook gates disabled yet clearing occurs → indicates a compaction path independent of the three documented GrowthBook gates
v2.1.91 Verification:
- 14 microcompact events in v2.1.91 test sessions
- Same index pattern
[58, 60, 68, 70, 74, 76] - No change from v2.1.90 — v2.1.91 does not fix microcompact
Cache Impact:
| Context | During Clearing | Without Clearing |
|---|---|---|
| Main session | 99%+ (no impact) | 99%+ |
| Sub-agent cold start | 0-39% | 47-87% |
| Sub-agent warmed | 94-99% | 94-99% |
Revised Assessment: Microcompact does NOT cause cache invalidation in main sessions. The real harm is context quality degradation — the agent loses access to earlier tool results and cannot accurately quote or reference them.
Method: cc-relay _scan_budget_enforcement() measures tool result sizes in outgoing requests.
GrowthBook Flags (confirmed active):
tengu_hawthorn_window: 200,000 (aggregate cap)
tengu_pewter_kestrel: {global: 50000, Bash: 30000, Grep: 20000, Snip: 1000}
tengu_summarize_tool_results: True
Results:
- Budget exceeded detected at 242,094 chars (> 200K cap)
- 261 budget events with
suspiciously_smallmarker - Tool results reduced to 1-41 chars in this session (originally thousands+; full-week max: 49 chars)
- Examples:
msg[32] 1 char,msg[42] 2 chars,msg[172] 22 chars(x33 occurrences)
v2.1.91 Verification:
- 71 budget events in v2.1.91 test session
- Same patterns —
maxResultSizeCharsoverride is MCP-only, built-in tools unaffected
Impact: Paid-for 1M context effectively has a 200K tool result ceiling for built-in tools. Beyond this, older results are silently truncated.
Method: tools/jsonl_analyzer.py classifies PRELIM vs FINAL entries.
Results across sessions:
| Session | Size | PRELIM | FINAL | Ratio | Inflation |
|---|---|---|---|---|---|
| Current main | 1115KB | 79 | 82 | 0.96x | 2.87x |
| Previous session | 118KB | 16 | 6 | 2.67x | — |
| Sub-agent | 326KB | 39 | 20 | 1.95x | — |
| Sub-agent | 186KB | 12 | 7 | 1.71x | — |
Key: The "inflation" factor measures total input tokens (all entries) vs FINAL-only tokens = 2.87x for current session. Sub-agents consistently show higher PRELIM ratios.
Open Question: Does the server-side rate limiter count PRELIM entries? If yes, thinking sessions are charged 2-3x more against the rate limit.
v2.1.90 changelog explicitly fixes --resume cache miss regression (since v2.1.69).
v2.1.91 adds transcript chain break fix.
Indirect evidence from existing data:
- 45 requests after 5+ min idle gap analyzed
- Average cache_creation% after >5min gap: 18.2% (vs 6.7% for <30s gap)
- Most 5-min gaps showed 96%+ cache read — TTL appears longer than 5 min in practice
Cache TTL (Layer 0, from @luongnv89's cache TTL analysis)
External finding: 13-hour idle → 350K token cache rebuild = 9% session budget. Our data: 5-26 min idle → 96%+ cache maintained. TTL expiration requires much longer gaps.
Assessment: Cache TTL is a design choice, not a bug. Impacts long-idle scenarios (hours), not normal work patterns.
| Metric | v2.1.90 | v2.1.91 |
|---|---|---|
| Cache warming (cold→97%) | ~5 requests | ~3 requests |
| Stable session | 95-99% | 97-99% |
| Microcompact | Active | Active (unchanged) |
| Budget enforcement | Active | Active (MCP override only) |
| Session ID header | Not captured | Working |
| Synthetic RL (B3) | Present | Not re-tested (no fix in changelog) |
| JSONL duplication (B8) | 2.87x | Not re-tested (no fix in changelog) |
Setup: Same prompt, same proxy, same machine, same version (v2.1.91), run in parallel.
| Metric | npm (Node.js) | Standalone (ELF) |
|---|---|---|
| Requests | 7 | 7 |
| Duration | 60s | 69s |
| Overall cache | 82.8% | 82.8% |
| Cold start | 84.7% | 84.7% |
| Stable range | 88-97% | 77-89% |
| Max body size | 190KB | 210KB |
| Microcompact events | 4 | 4 |
| Budget events | 0 | 0 |
Key finding: v2.1.91 closes the Sentinel gap. On v2.1.90, standalone cold start was 47-67% vs npm's 79-87%. On v2.1.91, both start at identical 84.7%. The stripAnsi Bun optimization and other v2.1.91 changes appear to have neutralized the Sentinel issue.
| Time | Cache Create | Cache Read | Read % | Body KB | Note |
|---|---|---|---|---|---|
| 10:14:36 | 4,628 | 25,576 | 84.7% | 109 | cold start |
| 10:14:37 | 0 | 0 | 0.0% | 14 | sub-agent init |
| 10:14:49 | 13,970 | 25,576 | 64.7% | 139 | context growth |
| 10:14:56 | 1,055 | 39,546 | 97.4% | 143 | stabilizing |
| 10:14:56 | 0 | 0 | 0.0% | 13 | sub-agent init |
| 10:15:04 | 5,376 | 40,601 | 88.3% | 160 | warming |
| 10:15:36 | 11,903 | 45,977 | 79.4% | 190 | new context burst |
| Time | Cache Create | Cache Read | Read % | Body KB | Note |
|---|---|---|---|---|---|
| 10:14:35 | 4,628 | 25,576 | 84.7% | 109 | cold start (identical to npm) |
| 10:14:35 | 0 | 0 | 0.0% | 14 | sub-agent init |
| 10:14:45 | 5,625 | 30,204 | 84.3% | 137 | context growth |
| 10:14:53 | 4,318 | 35,829 | 89.2% | 142 | warming |
| 10:15:10 | 11,807 | 40,147 | 77.3% | 173 | new context burst |
| 10:15:10 | 0 | 0 | 0.0% | 13 | sub-agent init |
| 10:15:44 | 11,761 | 51,954 | 81.5% | 210 | heavy read |
17 files read + 4 large grep/find commands. This session was designed to push past the 200K budget threshold.
| Time | Cache Create | Cache Read | Read % | Body KB | Note |
|---|---|---|---|---|---|
| 10:07:26 | 11,762 | 18,727 | 61.4% | 110 | cold start |
| 10:07:26 | 0 | 0 | 0.0% | 14 | sub-agent |
| 10:07:36 | 5,806 | 30,489 | 84.0% | 147 | warming |
| 10:07:45 | 6,093 | 36,295 | 85.6% | 149 | warming |
| 10:07:54 | 1,060 | 42,388 | 97.6% | 154 | stable |
| 10:08:00 | 1,449 | 43,448 | 96.8% | 158 | stable |
| 10:08:15 | 747 | 44,897 | 98.4% | 160 | stable |
| 10:08:22 | 1,469 | 45,644 | 96.9% | 166 | stable |
| 10:08:38 | 702 | 47,113 | 98.5% | 168 | stable |
| 10:08:38 | 0 | 0 | 0.0% | 13 | sub-agent |
| 10:08:56 | 6,018 | 47,815 | 88.8% | 224 | heavy context growth |
| 10:08:56 | 0 | 0 | 0.0% | 18 | sub-agent |
| 10:09:10 | 48,218 | 25,576 | 34.7% | 277 | cache rebuild after context shift |
| 10:10:06 | 13,615 | 73,794 | 84.4% | 294 | recovering |
10 microcompact events detected during this session. 0 budget events (body stayed under 300KB — threshold likely not reached in token terms despite 294KB body).
This data shows what happens in a long session where body size exceeds 500KB and microcompact is actively clearing:
| Time | Create | Read | Read % | Body KB | MC Active? |
|---|---|---|---|---|---|
| 10:20:29 | 237,490 | 8,237 | 3.4% | 746 | YES |
| 10:20:38 | 801 | 245,727 | 99.7% | 749 | YES |
| 10:20:52 | 159 | 246,528 | 99.9% | 749 | YES |
| 10:20:57 | 206 | 246,687 | 99.9% | 750 | YES |
| 10:21:14 | 5,986 | 246,893 | 97.6% | 769 | |
| 10:21:35 | 330 | 252,879 | 99.9% | 770 | YES |
| 10:22:15 | 442 | 253,209 | 99.8% | 771 | YES |
| 10:22:27 | 2,345 | 253,651 | 99.1% | 779 | YES |
| 10:22:33 | 641 | 255,996 | 99.8% | 781 | YES |
| 10:23:15 | 6,896 | 257,309 | 97.4% | 803 | YES |
Key observation: even at 800KB+ body with active microcompact clearing, cache read stays 97-99.9% — confirming that microcompact's stable substitution preserves the prompt prefix. The one outlier (3.4% at 10:20:29) is a cold-start/context-shift event, not caused by microcompact itself.
| Metric | v2.1.90 npm | v2.1.90 standalone | v2.1.91 npm | v2.1.91 standalone |
|---|---|---|---|---|
| Overall | 86.4% | 86.2% | 82.8% | 82.8% |
| Cold start | 79-87% | 47-67% | 84.7% | 84.7% |
| Stable | 95-99.8% | 95-99.7% | 88-97% | 77-89% |
| Sub-agent cold | 79-87% | 14-47% | — | — |
Note: v2.1.91 "overall" appears lower (82.8% vs 86%) because the test session was shorter (7 requests vs 11+ scenarios in v2.1.90 benchmark). Stable-phase readings on longer v2.1.91 sessions show 97-99%, consistent with v2.1.90.
Recommendation update: On v2.1.91, either installation is fine. npm retains a theoretical advantage (no Sentinel code path) but the practical difference is now negligible.
- cc-relay v2 — transparent proxy with microcompact detection + budget enforcement scanning + session ID support
- tools/jsonl_analyzer.py — PRELIM/FINAL classification, synthetic detection, duplicate tool call detection
- tools/gb_watcher.py — GrowthBook feature flag monitoring (11 keys)
- cc-relay SQLite DB — 3,500+ logged requests across multiple sessions
_meta["anthropic/maxResultSizeChars"]up to 500K — B5 MCP-only workaround--resumetranscript chain break fix — B2 additional fixEdit shorter old_string anchors— output token reduction
--resumefull cache miss fix (since v2.1.69) — B2 main fixper-turn JSON.stringify elimination— B1 improvementrate-limit dialog infinite loop— UX fixautocompact thrash loop detection— actually in v2.1.89
autocompact thrash loop3-retry stop — B6 partial fix (autocompact thrash loop — see 07_TIMELINE.md)tool schema bytes changing mid-session— B1 partial fixnested CLAUDE.md re-injection— context bloat fixStructuredOutput schema cache ~50% failure— separate cache bugmisleading "Rate limit reached" message— B3 message-only fix
X-Claude-Code-Session-Idheader — session tracking for proxiesRead tool compact format + deduplication— token reduction
idle-return 75min prompt— Cache TTL UX mitigationMCP tool descriptions 2KB cap— MCP overhead reduction"Improved p90 prompt cache rate"— unspecified cache improvement