Concrete tier choices for opencode-sdlc-wizard users — what the SDLC
loop actually costs at three monthly budgets, what trade-offs each
budget forces, and which models clear the SDLC capability floor in each
slot.
This is the document that complements the four privacy tiers in
PRIVACY.md. Privacy is "where does my data go?" Cost is "how much
am I paying for ceiling vs floor?" Most teams need to balance both.
Before picking a model, know the floor. The SDLC protocol (plan → TDD → self-review → optional cross-model review) requires:
- Decent instruction-following — multi-step tool calls without hallucinated arguments.
- Long-context reasoning — tracking a plan across 20-50 turns.
- Tool-use reliability — handling structured edits, bash output, schema validation.
| Class | Examples | Holds the protocol? |
|---|---|---|
| 7B–13B | Mistral 7B, Llama 3.2 8B, Qwen-Coder-7B | No. Fails partway. |
| 14B–22B | Phi-4 14B, DeepSeek-Coder-V2-16B, Codestral 22B | Borderline. Routine fixes OK; complex refactors slip. |
| 30B–70B code-tuned | Qwen2.5-Coder-32B, Qwen3-Coder, Llama 3.3 70B | Yes. The local sweet spot. |
| 100B+ open-weight | DeepSeek-V3.1 (671B MoE), GPT-OSS 120B | Yes. Strongest OSS reasoning. |
| Frontier proprietary | Claude Opus, GPT-5.5 xhigh, Gemini 2.5 Pro | Yes. Ceiling. |
Below the 30B-code-tuned bar, the wizard installs and runs but the protocol degrades. That's a capability outcome, not a port bug.
For hobbyists, students, or anyone evaluating the wizard before committing.
# Detect what's reachable, biased toward free providers
bash .opencode/scripts/detect-backends.sh --free-tier-first
# Pick the highest-leverage free option (verify the exact model ID
# against the provider's current catalog — see "Verifying current
# model IDs" below; these change faster than this doc ships)
bash .opencode/scripts/configure-backend.sh \
--tier hosted_oss --provider cerebras \
--model qwen-3-235b-a22b-instruct-2507| Slot | Provider | Example model (verify before use) | Why | Cost |
|---|---|---|---|---|
| Coder | Cerebras free | qwen-3-235b-a22b-instruct-2507 or gpt-oss-120b |
Fastest inference (~2000 tok/s), generous daily quota | $0 |
| Coder (alt) | Groq free | llama-3.3-70b-versatile |
Sub-second, free tier resets daily | $0 |
| Coder (alt) | Google AI Studio | gemini-2.5-flash (1M context) |
Generous daily request quota — verify current limits | $0 |
| Coder (alt) | OpenRouter | deepseek/deepseek-chat:free |
Aggregator routing, OSS models | $0 (rate-limited) |
| Reviewer | NVIDIA NIM | deepseek-ai/deepseek-r1 |
Reasoning model, free credits at build.nvidia.com | $0 |
| Reviewer (alt) | OpenRouter | qwen/qwen-3-coder:free |
Code-tuned, free routing | $0 (rate-limited) |
Verifying current model IDs. Provider catalogs change monthly. Before pinning a model, sanity-check the ID against the provider's live docs — Cerebras: https://inference-docs.cerebras.ai/models/overview, Google AI Studio: https://ai.google.dev/gemini-api/docs/models, NVIDIA NIM: https://build.nvidia.com, Groq: https://console.groq.com/docs/models. The IDs in this table were calibrated 2026-05-05; codex round-1 review caught
llama-3.3-70bas no longer in Cerebras's catalog andgemini-2.0-flashas deprecated (shutdown June 2026). Ifconfigure-backendwrites a stale ID,opencode runwill surface a clearmodel not founderror — fixable in seconds, not a footgun.
Real constraints at $0:
- Daily quotas reset, but agentic loops burn fast — one complex task can exhaust Groq's daily cap.
- OpenRouter
:freevariants are aggressively throttled under load. - Free tiers may log your prompts for training (read each provider's
policy). Privileged-data work belongs in
private_local, not free hosted. - No SLA. Outages happen.
What you give up vs paid: ceiling on novel architecture decisions, long-context refactors, and "this is gnarly, throw the strongest model at it" moments. For routine fixes + TDD loops, the gap is small.
Same budget, different trade-off: hardware capex instead of token spend. For privileged-data work, NDA contracts, or anyone who wants zero egress.
| Slot | Tool | Model | Hardware floor |
|---|---|---|---|
| Coder | Ollama | qwen3-coder:30b |
24GB VRAM (RTX 4090, M-series 32GB+) |
| Coder | Ollama | deepseek-coder-v2:16b |
16GB VRAM (RTX 4070 Ti, M Pro 16GB) |
| Coder | LM Studio | Qwen3-Coder-30B-Instruct |
Same as above |
| Coder | MLX (Apple Silicon) | mlx-community/Qwen2.5-Coder-32B-Instruct-4bit |
32GB unified memory; fastest on M-series |
| Reviewer | Ollama | qwen3-coder:30b (same instance) |
Use sequentially |
# Local + privacy-first
bash .opencode/scripts/configure-backend.sh \
--tier private_local --provider ollama \
--model qwen3-coder:30bConstraints:
- Hardware is the floor. <16GB VRAM = no serious local work for SDLC.
- Inference is slower than hosted (token/sec on a single consumer GPU vs Cerebras's wafer-scale farm).
- One model at a time means no parallel coder+reviewer (or you swap contexts).
Most professional engineers' sweet spot. One ceiling model where ceiling matters, free for the rest.
| Slot | Provider | Model | Cost | Why |
|---|---|---|---|---|
| Coder | Anthropic Claude (Pro) | claude-sonnet-4.6 |
$20/mo Pro sub or ~$3/M in via API | Best instruction-following + tool-use |
| Coder (alt) | OpenAI Plus | gpt-5.3-codex |
$20/mo Plus sub | Strong reasoning, especially with xhigh effort |
| Coder (alt) | Z.AI GLM Coding Plan | glm-5.1 |
$10/mo (or $30/quarter, $80/year — quarterly restructure May 2026; no flat-$18 SKU anymore) | Post-Anthropic-OAuth-ban migration target; verify at https://z.ai/subscribe |
| Reviewer | Cerebras free / Groq free | gpt-oss-120b |
$0 | Cheap second opinion; both Cerebras + Groq host gpt-oss-120b |
| Reviewer (alt) | DeepSeek direct | deepseek-v4-flash |
~$0.14/M in cache-miss (pennies/review) — verify current pricing | Strong OSS reasoning, cheapest hosted, V4 family shipped Apr 2026 |
| Reviewer (high-stakes) | Codex via API | gpt-5.3-codex xhigh |
~$1-3 per review at xhigh | When release-critical |
# v0.10.0: Mixed-Mode in one shot via `pick` — coder + reviewer split
# the work, OpenCode auto-routes review tasks via agent.review.model
npx opencode-sdlc-wizard pick \
--tier proprietary --provider anthropic \
--reviewer-tier hosted_oss --reviewer-provider cerebras
# Equivalent low-level invocation (what `pick` orchestrates under it):
bash .opencode/scripts/configure-backend.sh \
--tier proprietary --provider anthropic --model claude-sonnet-4.6 \
--reviewer-tier hosted_oss --reviewer-provider cerebras --reviewer-model gpt-oss-120bThe hybrid pattern that maximizes this budget:
- Coder = ceiling (Claude / GPT-5.5) for new code
- Reviewer = cheap (DeepSeek direct or Cerebras free) for the "second pair of eyes" pass
- Saves ~80% of review tokens vs running Claude on both sides without losing rigor
v0.10.0 ships Mixed-Mode in pick and configure-backend.sh —
emits both provider blocks plus agent.review.model so OpenCode
routes review tasks to the reviewer without any wrapper script.
For shops running CI gates on every PR, daily multi-hour agent loops, or anyone whose time is more valuable than tokens.
| Slot | Provider | Model | Approx monthly | Why |
|---|---|---|---|---|
| Coder | Anthropic | claude-opus-4.7 |
~$100-150 | Highest ceiling, 1M context tier |
| Coder (alt) | OpenAI | gpt-5.3-codex xhigh |
~$80-130 | Strongest reasoning at xhigh effort |
| Reviewer | Codex CLI (xhigh) | gpt-5.3-codex |
~$30-50 | Cross-model review on every release |
| Reviewer (parallel) | DeepSeek direct | deepseek-r1 |
~$10-20 | Cheap second-reviewer for triangulation |
| CI gate | Groq free / Cerebras free | gpt-oss-120b (both providers) |
$0 | Fast PR-review loops, free tier; v0.10.3 picker default |
The pattern that earns this budget:
- Three reviewers: codex (xhigh), DeepSeek-R1, plus the originating coder's self-review. Triangulated findings catch what any single reviewer misses.
- CI gates on every PR using a free fast reviewer (Groq) — the expensive reviewers fire only on release branches.
- 1M-context tier on the coder for whole-monorepo refactors.
Forget the budget bracket for a sec — pick by what the job is.
| Job | Best tier | Best model | Notes |
|---|---|---|---|
| Routine fix / typo / small CSS | hosted_oss free | Cerebras gpt-oss-120b or qwen-3-235b-a22b-instruct-2507 |
Fast + free + clears the bar |
| TDD on new feature, mid-stakes | $20 path | Claude Sonnet 4.6 | Tool-use is its strength |
| Long-context refactor | $200 path | Opus 4.7 (1M context) | Floor is "fits in context" |
| Security audit / privileged code | private_local | Qwen2.5-Coder-32B local | Zero egress |
| CI gating on every PR | hosted_oss free | Groq llama-3.3-70b-versatile |
Speed + cost dominate (Groq still ships Llama 3.3 70B; Cerebras dropped it) |
| Architecture decision (novel) | $200 path | Opus 4.7 + Codex xhigh review | Ceiling matters |
| Bulk doc / refactor sweeps | hosted_oss cheap-paid | DeepSeek-V3.1 direct | Cheap per token, decent quality |
| Air-gapped / compliance-locked | private_local | Whatever fits VRAM | Hardware floor matters more than ceiling |
The SDLC protocol (plan → TDD → self-review → optional cross-model review) is model-agnostic. The hooks fire under any backend OpenCode supports. The skills work with any model that clears the capability floor.
That's the actual point: the wizard is not about a specific tier winning. It's about the discipline holding regardless of what model you're paying (or not paying) for.
This is a snapshot — pricing and free-tier generosity shift faster than the wizard ships. If a number is stale by more than a few months or a provider has changed their tier structure, open an issue. The ratios (DeepSeek roughly an order of magnitude cheaper than Opus, Cerebras roughly an order of magnitude faster than hosted alternatives) tend to hold longer than the absolute prices.
Specific calibration notes for the prices/quotas in this doc:
- DeepSeek
deepseek-chat≈$0.14/M incache-miss (was$0.27/Min v0.8.0 — codex round-1 F4 caught the stale figure). Cache-hit pricing is meaningfully cheaper. Verify at https://api-docs.deepseek.com/quick_start/pricing. - Gemini 2.0 Flash is deprecated with shutdown 2026-06-01 — use
gemini-2.5-flashinstead. Free quota is per-model and changes; some quotas apply only to grounded prompts. Verify at https://ai.google.dev/gemini-api/docs/rate-limits. - Cerebras catalog rotates faster than other providers —
llama-3.3-70bwas in v0.8.0 of this doc, but as of 2026-05-05 the public production models arellama3.1-8b,gpt-oss-120b,qwen-3-235b-a22b-instruct-2507, andzai-glm-4.7. Verify at https://inference-docs.cerebras.ai/models/overview. - Together initial credit and Groq daily quota change frequently — always recheck.
Last calibrated: 2026-05-18 — v0.10.6 sweep. Refreshed gpt-5.5 →
gpt-5.3-codex (Feb 2026 release, most-pinned reviewer in surveyed
configs), deepseek-chat → deepseek-v4-flash (V4 family April 2026),
removed "Cerebras dropped llama-3.3-70b" qualifier from CI gate slot
since both Cerebras and Groq host gpt-oss-120b (the v0.10.3 picker
default). Added Z.AI GLM Coding Plan to $20/mo path with current
quarterly pricing ($10/$30/$80 — no flat $18/mo SKU since May 2026
restructure). Anthropic Sonnet 4.6 / Opus 4.7 unchanged.
Prior calibration: 2026-05-05 (codex round-1 corrections for v0.8.1).