AstroReason-Bench

AstroReason-Bench is evolving into a monorepo for space mission design benchmarks and first-party method layers.

The benchmark core remains algorithm-agnostic: benchmarks define problems, datasets, and verifiers; methods consume benchmarks and never the reverse.

For historical paper reproduction, use the v1 branch.

Repository Shape

astro-reason/
├── benchmarks/   # canonical problems, datasets, verifiers, generators
├── experiments/  # reproducible evaluated runs of methods
├── solvers/      # traditional solver implementations
├── runtimes/     # reusable execution substrates for agentic systems
├── scripts/      # repo-owned orchestration and validation entrypoints
└── tests/        # focused benchmark and tooling tests

Design Principles

Algorithm-agnostic benchmark core: benchmarks/ does not encode preferred solving strategies.
One-way contracts: methods may depend on benchmarks; benchmarks must not depend on method code.
Standalone benchmarks: each benchmark remains self-contained.
Reproducible method layers: experiments, solvers, and runtimes should be runnable and inspectable.

Directory Roles

benchmarks/ owns public benchmark definitions and benchmark-side tooling.
experiments/ owns flat runnable experiment families and shared prompt/config fragments.
solvers/ owns traditional non-agentic solver methods.
runtimes/ owns reusable agent runtime environments, build logic, and shared runtime assets.

Environment

Benchmark-core development uses uv. Method-owned directories may use different tooling when justified, as long as benchmark contracts stay clean.

Status

Current work is focused on:

refining benchmark definitions and verifier contracts
developing traditional solvers as baselines
implementing and running agentic experiments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AstroReason-Bench

Repository Shape

Design Principles

Directory Roles

Environment

Status

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AstroReason-Bench

Repository Shape

Design Principles

Directory Roles

Environment

Status