Changelog

All notable changes to OASIS will be documented in this file.

The format is based on Keep a Changelog.

[0.1.5] - 2026-02-26

Added

KSM now includes token efficiency as a third scoring factor — models that burn excessive tokens get penalized up to 30% (#47, #50)
Interactive export prompt after benchmark runs — copy share card or save HTML report (#48, #51)
Share / export option in results browser detail menu

Fixed

Anthropic token undercount: input_tokens excludes cached tokens, now sums all three fields (#44, #45)
Score label disambiguation: "Overall Score" → "Strategy Score" for LLM assessment, "Score" → "KSM" in table headers (#46, #49)
Remaining label inconsistencies in markdown, text, and terminal analysis output (#54)
Export prompt: writeFileSync crash on permission errors, unreachable no-analysis path, Ctrl+C mishandled (#55)
curl stderr leaking to terminal during benchmark runs (#52, #53)
Formula explainer now accurately describes KSM calculation

Changed

Updated KSM-SCORING.md and README.md to document token efficiency factor
363 tests passing (was 346)

[0.1.4] - 2026-02-27

Security

Fixed TOCTOU vulnerability in credential file writes — now uses atomic mode setting (0o600)
Fixed world-readable result files — benchmark transcripts now written with 0o600 permissions
Added path validation for report --output flag — prevents path traversal, warns on symlinks
Added input validation for --max-iterations — rejects NaN and negative values

Fixed

Ollama analyzer now defaults to benchmark model instead of hardcoded llama3.3 (#33)
API calls now timeout after 120s instead of hanging indefinitely on network issues
Fixed analysis: any type annotations — now uses proper AnalysisResult interface
gradient-string now has graceful fallback for terminals without truecolor support (Windows)

Changed

Deduplicated score bar rendering — now uses shared renderScoreBar() helper
Added 25 new tests (346 total) — XSS escaping, timeout behavior, score edge cases

[0.1.3] - 2026-02-24

Added

Polished CLI output: gradient banners, boxed layouts, cli-table3 tables throughout
Live model fetching from provider APIs (config flow + run wizard)
Share card report format (oasis report <id> -f share) — compact markdown for Discord/GitHub
Standalone HTML report format (oasis report <id> -f html) — dark-themed, no external deps
Clipboard support (oasis report <id> -f share --clipboard)

Fixed

ATT&CK technique classification now runs on every command during benchmarks (was always null)
Analyzer backfills step-level techniques from LLM stepsUsed mapping
Updated provider model lists to current (Claude Opus 4.6, o3, Grok 4, Gemini 2.5 Pro)

Changed

Interactive run wizard uses live model list with spinner + fallback to examples
Back-navigation wizard integrated with live model fetching
executeAndRecordStep helper now includes technique classification
Bumped @anthropic-ai/sdk ^0.71.2 → ^0.78.0
Bumped openai ^4.0.0 → ^6.25.0 (added type guard for v6 union type in runner)

[0.1.2] - 2026-02-23

Fixed

CLI --version now reads from package.json instead of hardcoded value
Docker auto-start on macOS when daemon isn't running
Per-image ARM64 fallback (only emulates containers that need it)

[0.1.1] - 2026-02-23

Fixed

KSM score could exceed 100 when rubric total exceeded 100 points (#29)
Ollama benchmarks failed with missing OPENAI_API_KEY error (#28)
Updated provider model lists: added Gemini 3 Flash, Grok 3/4

[0.1.0] - 2026-02-16

Added

CLI tool with commands: run, analyze, results, report, challenges, config, validate, providers
Multi-provider support: Anthropic, OpenAI, xAI, Google, Ollama, custom endpoints
LLM-powered post-run analysis with MITRE ATT&CK mapping
Kryptsec Scoring Model (KSM) with objective + qualitative rubric scoring
Multiple report formats: terminal, text, JSON, markdown
Challenge validation against JSON schema
Rate-limit retry with exponential backoff
Results summary with OWASP category grouping (oasis results summary)
Challenge comparison view (oasis results compare --challenge <id>)
XDG-compliant configuration (~/.config/oasis/)
153 automated tests (unit + E2E)
CI/CD pipeline (GitHub Actions)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

[0.1.5] - 2026-02-26

Added

Fixed

Changed

[0.1.4] - 2026-02-27

Security

Fixed

Changed

[0.1.3] - 2026-02-24

Added

Fixed

Changed

[0.1.2] - 2026-02-23

Fixed

[0.1.1] - 2026-02-23

Fixed

[0.1.0] - 2026-02-16

Added

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[0.1.5] - 2026-02-26

Added

Fixed

Changed

[0.1.4] - 2026-02-27

Security

Fixed

Changed

[0.1.3] - 2026-02-24

Added

Fixed

Changed

[0.1.2] - 2026-02-23

Fixed

[0.1.1] - 2026-02-23

Fixed

[0.1.0] - 2026-02-16

Added