inspect-evals-starter-with-policies

A minimal, batteries-included starter kit for evaluating LLM outputs against policy rules (PII, harmful content, jailbreak patterns, etc.). This project provides a simple evaluator, YAML-based policies, and a markdown/CSV report.

Why this exists

Quick-start an internal eval harness without heavy dependencies.
Keep policies in version-controlled YAML.
Produce repeatable, diffable results for CI or ad-hoc checks.

Quickstart

# 1) Create a virtual env (recommended)
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# 2) Install deps
pip install -r requirements.txt

# 3) Run the sample eval
PYTHONPATH=src python scripts/run_eval.py --config configs/eval_config.yaml

The run will generate outputs under out/ and print a summary. Open the markdown report in out/.

Project layout

inspect-evals-starter-with-policies/
├─ configs/
│  └─ eval_config.yaml
├─ data/
│  └─ samples/
│     ├─ prompts.jsonl
│     └─ outputs.jsonl
├─ policies/
│  ├─ base_policy.yaml
│  ├─ pii.yaml
│  ├─ harmful_content.yaml
│  └─ jailbreak.yaml
├─ scripts/
│  ├─ run_eval.py
│  └─ summarize_results.py
├─ src/inspect_evals/
│  ├─ __init__.py
│  ├─ datasets.py
│  ├─ evaluator.py
│  ├─ policies.py
│  ├─ report.py
│  ├─ utils.py
│  └─ checks/
│     ├─ __init__.py
│     ├─ pii.py
│     ├─ safety.py
│     └─ jailbreaks.py
├─ tests/
│  ├─ test_evaluator.py
│  └─ test_policies.py
├─ requirements.txt
├─ pyproject.toml
├─ VERSION.txt
├─ LICENSE
└─ .gitignore

Configuration

configs/eval_config.yaml controls which dataset and checks to run, and where to write results. Policies are defined under policies/*.yaml and are loaded by src/inspect_evals/policies.py.

Adding a new policy or check

Create a new YAML under policies/ with your rules.
Create a new checker in src/inspect_evals/checks/ that implements run(sample, policies).
Add the checker key to checks: in configs/eval_config.yaml.

Extending datasets

Put your prompts and model responses in JSONL files (data/your-dataset/). Each line should be a JSON object:

{
  "id": "uuid-or-string",
  "prompt": "Your prompt",
  "output": "Model response text"
}

Update paths in configs/eval_config.yaml and rerun.

License

MIT. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

inspect-evals-starter-with-policies

Quickstart

Project layout

Configuration

Adding a new policy or check

Extending datasets

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data/samples		data/samples
policies		policies
scripts		scripts
src/inspect_evals		src/inspect_evals
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
VERSION.txt		VERSION.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

devinnicholson/inspect-evals-starter

Folders and files

Latest commit

History

Repository files navigation

inspect-evals-starter-with-policies

Quickstart

Project layout

Configuration

Adding a new policy or check

Extending datasets

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages