promptdrift

Scheduled LLM prompt-regression / model-drift alarm for CI. Catch prompt regressions caused by server-side model drift — on a cron, not just on PRs. Open-source GitHub Action + CLI (@wartzar-bee/promptdrift), zero runtime dependencies, Anthropic + OpenAI.

promptdrift runs a small eval set against your LLM on a schedule, compares the pass-rate to a stored baseline, and alerts (opens/updates a GitHub issue + exits non-zero) the moment it regresses. It targets the failure mode that PR-time eval tools structurally can't see: the same model ID silently changing server-side, or a model-version bump quietly breaking a prompt with no commit on your side.

Keywords: prompt regression testing · model drift detection · scheduled LLM eval · GitHub Action · LLM CI/CD · prompt monitoring · Anthropic Claude · OpenAI.

$ npx @wartzar-bee/promptdrift

  promptdrift  anthropic:claude-3-5-haiku-latest
  ──────────────────────────────────────────────
  Cases   2/3 passed   (pass-rate 66.7%)
  Baseline 100.0%  ↓  now 66.7%

  PASS  refuses to reveal system prompt
  FAIL  answers capital of France
        expected output to contain "Paris"
  PASS  classifies sentiment as strict JSON

  REGRESSION DETECTED
  pass-rate dropped from 100.0% to 66.7%
  Newly failing: answers capital of France

Add the scheduled drift alarm in 1 step (copy-paste)

Drop this into .github/workflows/promptdrift.yml (this is also examples/promptdrift.yml):

name: promptdrift
on:
  schedule:
    - cron: "0 8 * * *"   # daily at 08:00 UTC — catches drift between PRs
  workflow_dispatch: {}   # also runnable on demand
permissions:
  issues: write           # so promptdrift can open/update the alert issue
  contents: read
jobs:
  drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: wartzar-bee/promptdrift@v0
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          # OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        with:
          config: promptdrift.json
          baseline: .promptdrift-baseline.json

Then, once, locally:

npx @wartzar-bee/promptdrift --update-baseline   # records today's pass-rate as the baseline
git add .promptdrift-baseline.json && git commit -m "promptdrift baseline"

Add ANTHROPIC_API_KEY (or OPENAI_API_KEY) as a repo secret and you're done — the workflow now re-runs your eval daily and opens a GitHub issue the moment the model drifts below your baseline.

Show the drift status in your README (embeddable badge)

The workflow above is named promptdrift, so GitHub serves a live status badge for it. Paste this into your README to show at a glance whether your prompts are still passing — and link it back to promptdrift so anyone who sees a green/red badge can find the tool:

[![promptdrift](https://github.com/OWNER/REPO/actions/workflows/promptdrift.yml/badge.svg)](https://github.com/wartzar-bee/promptdrift)

Replace OWNER/REPO with your repo. Rendered, it looks like a normal CI badge that flips red when a scheduled run detects drift:

(The badge above tracks this repo's own workflow; in your README it tracks yours.)

Why this exists (honest positioning vs promptfoo)

promptfoo is the popular incumbent for LLM evals, and it's good — at PR / code-change time. It runs your evals when you change your code. But the LLM behind a fixed model ID can change without any commit on your side, and promptfoo's own blog ("Your model upgrade just broke your agent's safety") concedes that gap.

promptdrift is not a promptfoo replacement — it's the complementary half:

	promptfoo	promptdrift
Trigger	PR / code change (CI)	schedule (cron) + on-demand
Catches	regressions you introduce	server-side model drift + version bumps
Setup	rich eval framework	one config file + one workflow
Output	CI pass/fail on the diff	baseline compare → GitHub issue alert

Use promptfoo for rich PR-time evals; add promptdrift to watch for drift between PRs. They stack.

No fabricated benchmarks here. promptdrift's value is purely the scheduled baseline-compare + alert mechanism — it does not claim to be a better evaluator than promptfoo.

Config (`promptdrift.json`)

A test case is { prompt, check }. check is dead-simple by default:

{
  "provider": "anthropic",
  "model": "claude-3-5-haiku-latest",
  "threshold": 0,
  "cases": [
    { "name": "answers capital of France",
      "prompt": "What is the capital of France? Answer with just the city name.",
      "check": { "type": "contains", "value": "Paris" } },

    { "name": "classifies sentiment as strict JSON",
      "system": "Respond with ONLY a JSON object, no prose.",
      "prompt": "Classify 'I love this'. Return {\"sentiment\": \"positive\"|\"negative\"|\"neutral\"}.",
      "check": { "type": "json-schema",
        "value": { "type": "object", "required": ["sentiment"],
          "properties": { "sentiment": { "type": "string", "enum": ["positive","negative","neutral"] } } } } }
  ]
}

Check types: contains, not-contains, regex, equals, json-schema. A bare string is shorthand for contains. A case can also carry an array of checks (all must pass). provider is anthropic or openai (default anthropic); the key is read from ANTHROPIC_API_KEY / OPENAI_API_KEY — env only, never logged or stored.

threshold (0–1, default 0) is the allowed drop in pass-rate before it counts as a regression. 0 means any drop alarms.

See examples/promptdrift.json for a runnable starter.

CLI

promptdrift                      run, compare to baseline, exit non-zero on regression
promptdrift --update-baseline    run and SAVE the result as the new baseline
promptdrift --config <path>      config file (default: ./promptdrift.json)
promptdrift --baseline <path>    baseline file (default: ./.promptdrift-baseline.json)
promptdrift --json               machine-readable output
promptdrift --no-color           plain output

Exit codes: 0 = no regression (or baseline saved) · 1 = regression detected · 2 = usage/config error.

When the model changes behaviour for a legitimate reason, accept the new state by re-running with --update-baseline and committing the updated .promptdrift-baseline.json.

How the GitHub-issue alert behaves

On a regression the Action opens a single GitHub issue (titled "promptdrift: prompt regression detected") and updates that same issue (with a fresh comment) on subsequent failing runs — so it never spams duplicates — and the workflow run fails (flipping your status badge red). When the eval recovers, no new issue is filed; close the existing one (or re-baseline).

Use as a GitHub Action

This repo is a self-contained composite Action — action.yml is at the root and wartzar-bee/promptdrift@v0 is tagged, so it can be referenced directly from any workflow (see the copy-paste block above). Categories it fits: Continuous Integration, Code Quality, Monitoring.

Design / how to verify

Node 22, ESM, zero runtime dependencies (stdlib + built-in fetch only).
A pure, network-free core (src/checks.mjs, src/config.mjs, src/runner.mjs, src/report.mjs) with the model call and the GitHub call behind injectable functions — so the whole thing unit-tests with a mock (no network in tests).
API keys come only from env and are never printed, logged, or written to disk.

npm test     # node --test — 42 tests, all offline

Status / roadmap

v0.1: scheduled drift alarm — contains / not-contains / regex / equals / json-schema checks, baseline compare + threshold, GitHub-issue alerting, Anthropic + OpenAI. 42 unit tests (npm test).
Next (evidence-driven, not yet built): optional LLM-as-judge check, per-case history/trend, Slack/webhook alert sink besides GitHub issues.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bin		bin
docs		docs
examples		examples
src		src
test		test
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

promptdrift

Add the scheduled drift alarm in 1 step (copy-paste)

Show the drift status in your README (embeddable badge)

Why this exists (honest positioning vs promptfoo)

Config (`promptdrift.json`)

CLI

How the GitHub-issue alert behaves

Use as a GitHub Action

Design / how to verify

Status / roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

promptdrift

Add the scheduled drift alarm in 1 step (copy-paste)

Show the drift status in your README (embeddable badge)

Why this exists (honest positioning vs promptfoo)

Config (promptdrift.json)

CLI

How the GitHub-issue alert behaves

Use as a GitHub Action

Design / how to verify

Status / roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Config (`promptdrift.json`)

Packages