Skip to content

Latest commit

 

History

History
105 lines (73 loc) · 4.35 KB

File metadata and controls

105 lines (73 loc) · 4.35 KB

Changelog

All notable changes to OASIS will be documented in this file.

The format is based on Keep a Changelog.

[0.1.5] - 2026-02-26

Added

  • KSM now includes token efficiency as a third scoring factor — models that burn excessive tokens get penalized up to 30% (#47, #50)
  • Interactive export prompt after benchmark runs — copy share card or save HTML report (#48, #51)
  • Share / export option in results browser detail menu

Fixed

  • Anthropic token undercount: input_tokens excludes cached tokens, now sums all three fields (#44, #45)
  • Score label disambiguation: "Overall Score" → "Strategy Score" for LLM assessment, "Score" → "KSM" in table headers (#46, #49)
  • Remaining label inconsistencies in markdown, text, and terminal analysis output (#54)
  • Export prompt: writeFileSync crash on permission errors, unreachable no-analysis path, Ctrl+C mishandled (#55)
  • curl stderr leaking to terminal during benchmark runs (#52, #53)
  • Formula explainer now accurately describes KSM calculation

Changed

  • Updated KSM-SCORING.md and README.md to document token efficiency factor
  • 363 tests passing (was 346)

[0.1.4] - 2026-02-27

Security

  • Fixed TOCTOU vulnerability in credential file writes — now uses atomic mode setting (0o600)
  • Fixed world-readable result files — benchmark transcripts now written with 0o600 permissions
  • Added path validation for report --output flag — prevents path traversal, warns on symlinks
  • Added input validation for --max-iterations — rejects NaN and negative values

Fixed

  • Ollama analyzer now defaults to benchmark model instead of hardcoded llama3.3 (#33)
  • API calls now timeout after 120s instead of hanging indefinitely on network issues
  • Fixed analysis: any type annotations — now uses proper AnalysisResult interface
  • gradient-string now has graceful fallback for terminals without truecolor support (Windows)

Changed

  • Deduplicated score bar rendering — now uses shared renderScoreBar() helper
  • Added 25 new tests (346 total) — XSS escaping, timeout behavior, score edge cases

[0.1.3] - 2026-02-24

Added

  • Polished CLI output: gradient banners, boxed layouts, cli-table3 tables throughout
  • Live model fetching from provider APIs (config flow + run wizard)
  • Share card report format (oasis report <id> -f share) — compact markdown for Discord/GitHub
  • Standalone HTML report format (oasis report <id> -f html) — dark-themed, no external deps
  • Clipboard support (oasis report <id> -f share --clipboard)

Fixed

  • ATT&CK technique classification now runs on every command during benchmarks (was always null)
  • Analyzer backfills step-level techniques from LLM stepsUsed mapping
  • Updated provider model lists to current (Claude Opus 4.6, o3, Grok 4, Gemini 2.5 Pro)

Changed

  • Interactive run wizard uses live model list with spinner + fallback to examples
  • Back-navigation wizard integrated with live model fetching
  • executeAndRecordStep helper now includes technique classification
  • Bumped @anthropic-ai/sdk ^0.71.2 → ^0.78.0
  • Bumped openai ^4.0.0 → ^6.25.0 (added type guard for v6 union type in runner)

[0.1.2] - 2026-02-23

Fixed

  • CLI --version now reads from package.json instead of hardcoded value
  • Docker auto-start on macOS when daemon isn't running
  • Per-image ARM64 fallback (only emulates containers that need it)

[0.1.1] - 2026-02-23

Fixed

  • KSM score could exceed 100 when rubric total exceeded 100 points (#29)
  • Ollama benchmarks failed with missing OPENAI_API_KEY error (#28)
  • Updated provider model lists: added Gemini 3 Flash, Grok 3/4

[0.1.0] - 2026-02-16

Added

  • CLI tool with commands: run, analyze, results, report, challenges, config, validate, providers
  • Multi-provider support: Anthropic, OpenAI, xAI, Google, Ollama, custom endpoints
  • LLM-powered post-run analysis with MITRE ATT&CK mapping
  • Kryptsec Scoring Model (KSM) with objective + qualitative rubric scoring
  • Multiple report formats: terminal, text, JSON, markdown
  • Challenge validation against JSON schema
  • Rate-limit retry with exponential backoff
  • Results summary with OWASP category grouping (oasis results summary)
  • Challenge comparison view (oasis results compare --challenge <id>)
  • XDG-compliant configuration (~/.config/oasis/)
  • 153 automated tests (unit + E2E)
  • CI/CD pipeline (GitHub Actions)