Skip to content

Latest commit

 

History

History
244 lines (175 loc) · 11.9 KB

File metadata and controls

244 lines (175 loc) · 11.9 KB

🇰🇷 한국어 버전

Quick Start — Setup & Self-Diagnosis

Practical steps to minimize token drain. Three approaches: stay on v2.1.109 (recommended, April 17), stay current (v2.1.91+), or downgrade (v2.1.63). For the technical analysis behind these recommendations, see 01_BUGS.md.


⚠️ Opus 4.7 Warning (April 17, 2026)

Do not upgrade Claude Code past v2.1.109. Opus 4.7 (shipped with v2.1.111) introduces a 2.4x averaged Q5h burn rate (independently measured by two interceptors), model pin bypass (settings.json ignored), and cache metering anomaly (~7x reported overcharge). v2.1.109 sends explicit claude-opus-4-6 model IDs and is unaffected by the April 23 API default switchover. Full analysis: 16_OPUS-47-ADVISORY.md.


Which Approach Is Right for You?

Option C: Pin v2.1.109 (recommended) Option A: Stay Current (v2.1.91+) Option B: Downgrade (v2.1.63)
Opus 4.7 Not exposed — sends explicit 4.6 model ID ⚠️ v2.1.111+ defaults to 4.7, model pin ignored Not exposed (predates 4.7)
Cache bugs B1/B2 fixed, native 1h cache B1/B2 fixed — 95-99% cache hit No cache bugs (predates them)
Features Full: hooks, skills, plugins, 1M context, Agent SDK Full access No hooks, no skills, limited plugin
Token cost Opus 4.6 rates (no +35% tokenizer) ⚠️ If on 4.7: +35% tokenizer + 2.4x thinking overhead Lowest baseline (~68K avg)
New bugs B3-B11 present. Workarounds exist B3-B11 + 4.7 issues (#49302, #49503, #49541) Not exposed to B8a-B11
Best for Most users — full features without 4.7 risks Users who need latest CC features (post-v2.1.109) Cost efficiency over features

Independent confirmations for v2.1.63 downgrade:

  • @wpank: 47,810 requests tracked — 2.2x more output per dollar, 68% cheaper per request (caveat: session length mismatch in comparison)
  • @janstenpickle: "1.5 hours with latest version to go nowhere and 5 minutes with downgraded version"
  • @diabelko: "token usage dropped to 1-3% of limits"
  • @breno-ribeiro706: "better results but still faster token consumption than a month ago" — server-side issues persist regardless of CLI version

Option A: Stay Current (v2.1.91+)

npm Installation (strongly recommended)

# Install via npm
npm install -g @anthropic-ai/claude-code

# Verify
claude --version   # should show 2.1.91 or later
file $(which claude)   # should show symbolic link to cli.js

npm vs Standalone — Which to Choose?

The standalone binary is a bundled Bun executable. The Sentinel cache bug (B1) is fixed in v2.1.91, so both installations perform comparably out of the box. The difference is in what community fixes you can apply.

Community tools use three interception approaches, each with different compatibility:

Approach npm Standalone Representative Tools
ANTHROPIC_BASE_URL proxy Works Works X-Ray Interceptor, Pruner, cc-trace (mitmproxy)
Claude Code Hooks API (settings.json) Works Works OpenWolf (★363), cc-trace (Go/Shell hooks)
NODE_OPTIONS --import preload Works Does not work claude-code-cache-fix

The key trade-off: cnighswonger's cache-fix interceptor — which normalizes block positions, sorts tool definitions, and strips image carry-forward — is the most impactful single fix (98.3% cache hit). It requires NODE_OPTIONS, so it is npm-only. If you want this specific fix, you need npm.

Proxy-based tools and Hooks API tools (including OpenWolf) work with both installations. If you are on standalone and don't need the cache-fix interceptor, you can stay.

Metric npm (v2.1.91) Standalone (v2.1.91)
Cache cold start 84.5% 27.8-84.7% (varies)
Cache stable session 98-99.6% 94-99%
Community fix support All 3 approaches Proxy + Hooks only

Pin Your Version

# Disable auto-update (add to ~/.claude/settings.json)
{
  "model": "claude-opus-4-6",
  "effortLevel": "high",
  "env": {
    "DISABLE_AUTOUPDATER": "1",
    "ANTHROPIC_MODEL": "claude-opus-4-6"
  }
}

Why pin v2.1.109 specifically? It is the most recent version that (1) sends explicit claude-opus-4-6 model IDs (verified), (2) has native 1h cache (cache_creation_5m=0), (3) is not affected by the v2.1.111+ model pin bypass (#49503), and (4) will not be affected by the April 23 API default switchover. See 16_OPUS-47-ADVISORY.md for full analysis.

Recommended Environment Variables

# Add to ~/.claude/settings.json "env" section:

# Disable adaptive thinking zero-reasoning bug (B11, Anthropic acknowledged)
"CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING": "1"

# Optional: set effort to high (default medium=85 may under-allocate reasoning)
# Alternative: use /effort high or /effort max per-session

Optional: Cache Fix Interceptor

@cnighswonger's claude-code-cache-fix patches three remaining cache-busting issues that v2.1.91 did NOT fix:

  1. Attachment block scatter — skills/MCP/hooks blocks drift to wrong positions on resume (52-73% of calls affected)
  2. Non-deterministic tool ordering — tool definitions arrive in random order (98% of calls affected)
  3. Image carry-forward — base64 images from Read tool persist in all subsequent calls (up to 266K tokens/turn)
npm install -g claude-code-cache-fix

# Add to ~/.claude/settings.json:
{
  "env": {
    "NODE_OPTIONS": "--import claude-code-cache-fix"
  }
}

Reported result: 98.3% cache hit rate on resumed sessions (4,700 calls over 8 days). The interceptor is read-only — it normalizes request structure before it hits the API, without modifying your conversation.


Option B: Downgrade to v2.1.63

v2.1.63 (February 28, 2026) is the last version before the "Output efficiency" system prompt change (v2.1.64, March 3). Multiple users independently report better code quality and lower token consumption.

Install

# Direct install of specific version
curl -fsSL https://claude.ai/install.sh | bash -s 2.1.63

# Verify
claude --version   # should show 2.1.63

Pin to Prevent Auto-Update

# Same as Option A — disable auto-updater
{
  "env": {
    "DISABLE_AUTOUPDATER": "1"
  }
}

What You Lose

  • HooksPreToolUse, PostToolUse, UserSubmitPrompt hooks are not available
  • Skills — plugin skills and /skill commands are not available
  • Some MCP featuresmaxResultSizeChars override (v2.1.91) is not available
  • Cache fixes — B1/B2 fixes are not present (but v2.1.63 predates the bugs that B1/B2 fix, so this is not a regression)
  • Agent SDK features — newer orchestrator features may not work

What You Keep

  • Core Claude Code functionality (Read, Edit, Write, Bash, Grep, Glob, Agent)
  • API compatibility (the Messages API is backward compatible)
  • CLAUDE.md and memory files
  • MCP server connections (basic)

Important Caveat

Server-side issues persist regardless of CLI version. @breno-ribeiro706 confirmed: "still faster than a month ago." Bugs B4 (microcompact), B5 (budget cap), and server-side quota changes are controlled server-side via GrowthBook flags — no client downgrade can fix them. The improvement from downgrading is real but partial.


Behaviors to Avoid (Both Options)

Behavior Why Impact
--resume / --continue Replays entire conversation history as billable input 500K+ tokens per resume. Start fresh sessions instead
/dream, /insights Background API calls consume tokens without visible output Silent drain
/branch Duplicates or un-compacts message history (B9) Can inflate 6% → 73% context in one message
/release-notes Injects full changelog (~40-60K tokens) into context for rest of session Silent context bloat
Multiple terminals simultaneously No cache sharing between terminals, parallel quota drain Limit to one active terminal
Large CLAUDE.md / many plugins Sent on every turn, compounds with tool definitions Keep CLAUDE.md lean. 47 skills = ~70K overhead/turn

Behaviors to Adopt

Behavior Why Benefit
Start fresh sessions often The 200K tool result cap (B5) silently truncates older results Clean context = better model performance
/effort high or /effort max Default medium (effort=85) can under-allocate reasoning (B11) Better output quality, reduces retry waste
CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 Prevents zero-reasoning turns that produce fabrications (B11) Anthropic acknowledged this bug
Monitor /usage and /context Catch abnormal drain early Know when to restart
Pin your CLI version Future updates may reintroduce regressions Stability

Self-Diagnosis

Quick Check: Are You Affected?

Symptom                                          → Likely Cause
─────────────────────────────────────────────────────────────────
"Rate limit reached" immediately, 0% usage       → B3 (false rate limiter) or server billing bug
Session drains 50%+ in <30 min                   → Multiple bugs compounding, check cache ratio
Model takes shortcuts, "simplest fix"             → B11 (adaptive thinking) + Output efficiency prompt
Tool results seem missing mid-session             → B4/B5 (microcompact + budget cap)
--resume costs massive tokens                     → B2/B2a (cache miss on resume)
Session won't load, 400 error                     → B8a (JSONL corruption from concurrent tools)
/branch immediately fills context                 → B9 (/branch inflation)

Check Your Cache Ratio (Session JSONL)

Session files in ~/.claude/projects/ contain usage data for each turn:

# Find your latest session
ls -lt ~/.claude/projects/*/sessions/ | head -5

# Quick cache ratio check (using jq)
cat <session-file>.jsonl | jq -r '
  select(.type == "assistant" and .costInfo) |
  "\(.costInfo.cacheReadInputTokens // 0) / \(.costInfo.cacheCreationInputTokens // 0)"
' | head -20
  • Healthy session: cache_read >> cache_creation (read ratio > 80%)
  • Affected session: cache_creation >> cache_read (read ratio < 40%)

If most sessions show low read ratios on v2.1.91+, the cache-fix interceptor (Option A) or a downgrade (Option B) may help.

Optional: Monitor via Proxy

# Route through local proxy for real-time monitoring
ANTHROPIC_BASE_URL=http://localhost:8080 claude

# Parse cache_creation_input_tokens / cache_read_input_tokens from responses
# Healthy: read ratio > 80%  |  Affected: read ratio < 40%

Community monitoring tools:


See 01_BUGS.md for technical root causes, 04_BENCHMARK.md for npm vs standalone benchmarks, 12_ADVANCED-GUIDE.md for advanced configuration.