Skip to content

ClaudeCliBackend hardcodes --bare, breaking subscription-token auth (silent 0-score) #68

@nellylemmy

Description

@nellylemmy

Bug: ClaudeCliBackend hardcodes --bare, which breaks subscription-token auth → silent 0-score

Repo: microsoft/SkillOpt
File: skillopt_sleep/backend.pyClaudeCliBackend._call (and the attempt_with_tools variant)
Severity: High for any user whose claude CLI is authenticated by a logged-in subscription token rather than ANTHROPIC_API_KEY. The failure is silent: the cycle reports EXIT=0 but every model call returns an error string that scores 0.

Symptom

Running the Claude backend on a machine with no ANTHROPIC_API_KEY (auth via claude subscription login):

python -m skillopt_sleep dry-run --backend claude --scope invoked
=> n_tasks: 40, baseline 0.0 -> candidate 0.0, gate reject, edits: []

The llm_miner produces 0 checkable tasks and reflect proposes nothing — not because the sessions lack signal, but because every claude -p subprocess returns Not logged in · Please run /login, which the JSON parser turns into an empty/zero result.

Root cause

_call builds the command with --bare:

cmd = [
    self.claude_path, "-p", "--output-format", "text",
    "--bare",                              # <-- this
    "--disable-slash-commands",
    "--disallowedTools", "*",
    "--exclude-dynamic-system-prompt-sections",
]

--bare skips plugin/hook/LSP init and the credential-resolution path that loads the logged-in subscription token. With an ANTHROPIC_API_KEY in env it still authenticates; with subscription-token auth it does not.

Reproduction (subscription-auth machine, no API key)

# authenticated
claude -p --output-format text -- "say pong"
# -> pong

# with --bare (what the backend uses)
claude -p --output-format text --bare --disable-slash-commands \
  --disallowedTools '*' -- "say pong"
# -> Not logged in · Please run /login

# without --bare, other isolation flags kept
claude -p --output-format text --disable-slash-commands \
  --disallowedTools '*' --exclude-dynamic-system-prompt-sections -- "say pong"
# -> pong          <-- auth + isolation both intact

Confirmed end-to-end: after dropping --bare, the same dry-run mined 8 checkable
tasks from real transcripts and reflect proposed 8 concrete edits (gate then
correctly rejected them because the baseline already scored 1.0 — a separate,
expected behavior, not this bug).

Fix (one line, conditional)

--bare is only safe when auth comes from ANTHROPIC_API_KEY. Gate it:

cmd = [self.claude_path, "-p", "--output-format", "text"]
if os.environ.get("ANTHROPIC_API_KEY"):
    cmd.append("--bare")          # safe with API-key auth; strips subscription token otherwise
cmd += [
    "--disable-slash-commands",
    "--disallowedTools", "*",
    "--exclude-dynamic-system-prompt-sections",
]

The remaining flags (--disable-slash-commands, --disallowedTools '*',
--exclude-dynamic-system-prompt-sections, clean temp cwd) already provide the
isolation --bare was added for: no global skills, no tool use, no per-machine
system-prompt sections, no project CLAUDE.md. Dropping --bare for
subscription auth loses only the plugin/hook/LSP skip (a minor warm-up cost),
not isolation.

Apply the same change to attempt_with_tools (which also hardcodes --bare).

Suggested hardening (separate)

A model call that returns a non-JSON error string ("Not logged in …") is silently
scored 0. Consider detecting known CLI error prefixes in _call and raising/logging
instead of returning them as content, so an auth failure surfaces loudly rather
than deflating every baseline to 0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions