This file provides instructions for AI coding assistants working on this repository.
This repository converts custom LLM evaluation formats into Inspect AI's canonical eval format using the Task framework (not manual EvalLog construction).
- ALWAYS use
task.py- Never createconvert.pyfiles - Use
task_main()helper - Never manually construct EvalLog objects - Store messages in metadata - Use
sample.metadata["messages"]notsample.messages - Output naming: Always creates
input.eval(same name as input with.evalextension)
- Copy from example: Start with
examples/simple_chat/task.pyas template - Read documentation: Check
docs/INDEX.mdanddocs/CONVERSION_GUIDE.md - Study similar examples: Pick the closest match to your format
- Create task.py: Follow the required structure (see
.cursor/rules/001-core-patterns.mdc) - Test immediately: Run
uv run python task.py input.jsonand validate output
Every task.py MUST have:
@solver
def replay_solve():
# Replay pre-recorded messages
@scorer(metrics=[mean()])
def score_scorer():
# Read score from metadata
@task
def my_task(input_path: Path, model_name: str, **kwargs) -> Task:
# Create Sample objects with metadata
# Return Task with solver and scorer
if __name__ == "__main__":
task_main(my_task, get_model_name)- Examples:
examples/*/task.py- Study these patterns - Documentation:
docs/CONVERSION_GUIDE.md- Step-by-step guide - Utilities:
src/inspect_convertor/utils.py-task_main()helper - Troubleshooting:
docs/TROUBLESHOOTING.md- Common issues
- ❌ Create
convert.pyfiles - ❌ Use
EvalLog,EvalSample,EvalSpecdirectly - ❌ Use deprecated
safe_convert_message()orsafe_extract_score() - ❌ Use
ConversionContextorcreate_conversion_context() - ❌ Create
output.evalfiles (useinput.eval)
- ✅ Use
@taskdecorator - ✅ Use
task_main()frominspect_convertor.utils - ✅ Store messages in
sample.metadata["messages"] - ✅ Create ModelEvents in metadata for tools/branching
- ✅ Run
make testafter changes - ✅ Validate output with
inspect-convert-validate
# 1. Study example
cat examples/simple_chat/task.py
# 2. Create new task.py based on pattern
# (follow .cursor/rules/001-core-patterns.mdc)
# 3. Install dependencies (if needed)
uv pip install -e .
# 4. Test it
uv run python examples/my_format/task.py examples/my_format/input.json
# 5. Validate
inspect-convert-validate examples/my_format/input.eval
# 6. Run all tests
make test- Check
.cursor/rules/for detailed patterns - Read
docs/CONVERSION_GUIDE.mdfor complete examples - See
docs/TROUBLESHOOTING.mdfor error solutions - Look at
examples/for working implementations