All notable changes to OASIS will be documented in this file.
The format is based on Keep a Changelog.
- KSM now includes token efficiency as a third scoring factor — models that burn excessive tokens get penalized up to 30% (#47, #50)
- Interactive export prompt after benchmark runs — copy share card or save HTML report (#48, #51)
Share / exportoption in results browser detail menu
- Anthropic token undercount:
input_tokensexcludes cached tokens, now sums all three fields (#44, #45) - Score label disambiguation: "Overall Score" → "Strategy Score" for LLM assessment, "Score" → "KSM" in table headers (#46, #49)
- Remaining label inconsistencies in markdown, text, and terminal analysis output (#54)
- Export prompt:
writeFileSynccrash on permission errors, unreachable no-analysis path, Ctrl+C mishandled (#55) - curl stderr leaking to terminal during benchmark runs (#52, #53)
- Formula explainer now accurately describes KSM calculation
- Updated KSM-SCORING.md and README.md to document token efficiency factor
- 363 tests passing (was 346)
- Fixed TOCTOU vulnerability in credential file writes — now uses atomic mode setting (0o600)
- Fixed world-readable result files — benchmark transcripts now written with 0o600 permissions
- Added path validation for
report --outputflag — prevents path traversal, warns on symlinks - Added input validation for
--max-iterations— rejects NaN and negative values
- Ollama analyzer now defaults to benchmark model instead of hardcoded llama3.3 (#33)
- API calls now timeout after 120s instead of hanging indefinitely on network issues
- Fixed
analysis: anytype annotations — now uses properAnalysisResultinterface - gradient-string now has graceful fallback for terminals without truecolor support (Windows)
- Deduplicated score bar rendering — now uses shared
renderScoreBar()helper - Added 25 new tests (346 total) — XSS escaping, timeout behavior, score edge cases
- Polished CLI output: gradient banners, boxed layouts, cli-table3 tables throughout
- Live model fetching from provider APIs (config flow + run wizard)
- Share card report format (
oasis report <id> -f share) — compact markdown for Discord/GitHub - Standalone HTML report format (
oasis report <id> -f html) — dark-themed, no external deps - Clipboard support (
oasis report <id> -f share --clipboard)
- ATT&CK technique classification now runs on every command during benchmarks (was always null)
- Analyzer backfills step-level techniques from LLM stepsUsed mapping
- Updated provider model lists to current (Claude Opus 4.6, o3, Grok 4, Gemini 2.5 Pro)
- Interactive run wizard uses live model list with spinner + fallback to examples
- Back-navigation wizard integrated with live model fetching
executeAndRecordStephelper now includes technique classification- Bumped
@anthropic-ai/sdk^0.71.2 → ^0.78.0 - Bumped
openai^4.0.0 → ^6.25.0 (added type guard for v6 union type in runner)
- CLI
--versionnow reads from package.json instead of hardcoded value - Docker auto-start on macOS when daemon isn't running
- Per-image ARM64 fallback (only emulates containers that need it)
- KSM score could exceed 100 when rubric total exceeded 100 points (#29)
- Ollama benchmarks failed with missing OPENAI_API_KEY error (#28)
- Updated provider model lists: added Gemini 3 Flash, Grok 3/4
- CLI tool with commands:
run,analyze,results,report,challenges,config,validate,providers - Multi-provider support: Anthropic, OpenAI, xAI, Google, Ollama, custom endpoints
- LLM-powered post-run analysis with MITRE ATT&CK mapping
- Kryptsec Scoring Model (KSM) with objective + qualitative rubric scoring
- Multiple report formats: terminal, text, JSON, markdown
- Challenge validation against JSON schema
- Rate-limit retry with exponential backoff
- Results summary with OWASP category grouping (
oasis results summary) - Challenge comparison view (
oasis results compare --challenge <id>) - XDG-compliant configuration (
~/.config/oasis/) - 153 automated tests (unit + E2E)
- CI/CD pipeline (GitHub Actions)