Agent Instructions for Inspect Eval Convertor

This file provides instructions for AI coding assistants working on this repository.

Project Overview

This repository converts custom LLM evaluation formats into Inspect AI's canonical eval format using the Task framework (not manual EvalLog construction).

Critical Principles

ALWAYS use task.py - Never create convert.py files
Use task_main() helper - Never manually construct EvalLog objects
Store messages in metadata - Use sample.metadata["messages"] not sample.messages
Output naming: Always creates input.eval (same name as input with .eval extension)

When Creating a New Converter

Copy from example: Start with examples/simple_chat/task.py as template
Read documentation: Check docs/INDEX.md and docs/CONVERSION_GUIDE.md
Study similar examples: Pick the closest match to your format
Create task.py: Follow the required structure (see .cursor/rules/001-core-patterns.mdc)
Test immediately: Run uv run python task.py input.json and validate output

Task.py Structure

Every task.py MUST have:

@solver
def replay_solve():
    # Replay pre-recorded messages

@scorer(metrics=[mean()])
def score_scorer():
    # Read score from metadata

@task
def my_task(input_path: Path, model_name: str, **kwargs) -> Task:
    # Create Sample objects with metadata
    # Return Task with solver and scorer

if __name__ == "__main__":
    task_main(my_task, get_model_name)

Files to Reference

Examples: examples/*/task.py - Study these patterns
Documentation: docs/CONVERSION_GUIDE.md - Step-by-step guide
Utilities: src/inspect_convertor/utils.py - task_main() helper
Troubleshooting: docs/TROUBLESHOOTING.md - Common issues

Never Do

❌ Create convert.py files
❌ Use EvalLog, EvalSample, EvalSpec directly
❌ Use deprecated safe_convert_message() or safe_extract_score()
❌ Use ConversionContext or create_conversion_context()
❌ Create output.eval files (use input.eval)

Always Do

✅ Use @task decorator
✅ Use task_main() from inspect_convertor.utils
✅ Store messages in sample.metadata["messages"]
✅ Create ModelEvents in metadata for tools/branching
✅ Run make test after changes
✅ Validate output with inspect-convert-validate

Example Workflow

# 1. Study example
cat examples/simple_chat/task.py

# 2. Create new task.py based on pattern
# (follow .cursor/rules/001-core-patterns.mdc)

# 3. Install dependencies (if needed)
uv pip install -e .

# 4. Test it
uv run python examples/my_format/task.py examples/my_format/input.json

# 5. Validate
inspect-convert-validate examples/my_format/input.eval

# 6. Run all tests
make test

Getting Help

Check .cursor/rules/ for detailed patterns
Read docs/CONVERSION_GUIDE.md for complete examples
See docs/TROUBLESHOOTING.md for error solutions
Look at examples/ for working implementations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Instructions for Inspect Eval Convertor

Project Overview

Critical Principles

When Creating a New Converter

Task.py Structure

Files to Reference

Never Do

Always Do

Example Workflow

Getting Help

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Agent Instructions for Inspect Eval Convertor

Project Overview

Critical Principles

When Creating a New Converter

Task.py Structure

Files to Reference

Never Do

Always Do

Example Workflow

Getting Help