Skip to content

forkadarshp/MPort

ModelPort

CI License: MIT Codex Skill PRs Welcome Status: Alpha

ModelPort social preview

Ship model upgrades without breaking prod — then prove it.

Two things in one agent skill: behavior-preserving migrations across prompts, agents, tools, API callers, and tests — and built-in evals that score your old vs. new system so every upgrade is backed by numbers, not vibes. Works in Claude Code, Codex, and Cursor.

Install in one line: npx skills add forkadarshp/MPort — then tell your agent to migrate. See Quickstart.

The model ID is one line. The behavior around it is everything else. Most migrations break because search-and-replace silently shatters tool calling, parsers, and orchestration. ModelPort treats a swap as behavior preservation:

  # the naive migration         ┄ what a find/replace does
  $ grep -rl 'opus-4-7' . | xargs sed -i 's/4-7/4-8/g'
  x tool calls break      x parser drift        x silent prod regressions

  # the ModelPort migration     ┄ behavior-preserving
  > migrate this repo to claude-opus-4-8, keep behavior, prove it
  ✓ callers mapped        ✓ contract held       ✓ tests + rollback evidence

Star it if it's saved you a migration headache — it helps other teams find it.

Now shipping: Claude Opus 4.8 (released 2026-05-28)

Opus 4.8 is live, and ModelPort already ships a dedicated migration guide for it: references/models/claude-opus-4-8.md. It covers the 4.7 → 4.8 drop-in path, the 4.6 → 4.7 breaking changes you must apply first (non-default sampling params now 400, manual extended thinking → {type: "adaptive"} + effort, new tokenizer, removed assistant prefills), and effort re-baselining. Point the skill at your repo:

Use $modelport-skill to migrate this codebase to claude-opus-4-8.
Apply required compatibility fixes first, keep tool + output behavior stable,
and return validation evidence plus rollback notes.

How it works

ModelPort treats a model swap as a behavior-preservation problem, not a search-and-replace. The model ID is one line; the behavior around it — tool calls, parsers, output contracts, prompts, orchestration — is everything else. The skill drives a strict, phase-gated pipeline that maps every caller, classifies each hit before editing, keeps required fixes apart from optional tuning, and refuses to claim "done" without proof.

╭──────────────────────────────────────────────────────────────────────╮
│                                                                      │
│   [ Opus 4.7 ]  ═══>  ( ModelPort )  ═══>  [ Opus 4.8 ]              │
│   drop-in upgrade    behavior-preserving    GA · 2026-05-28          │
│                                                                      │
│   » phase 0   lock scope, read current state ················· [ OK ]│
│   » phase 1   discover callers, prompts, agents, tools ······· [ OK ]│
│   » phase 2   classify every hit before editing ·············· [ OK ]│
│   » phase 3   design: required fixes vs optional tuning ······ [ OK ]│
│   » phase 4   surgical patches, no blind search/replace ······ [ OK ]│
│   » phase 5   validate: tests + live smoke + contract ········ [ OK ]│
│   » phase 6   report: proof evidence + rollback notes ········ [ OK ]│
│                                                                      │
╰──────────────────────────────────────────────────────────────────────╯

More than a model-ID swap

ModelPort doesn't just rewrite model="...". It updates prompts, tool descriptions, and guardrails to match how the target model actually behaves — including the undocumented quirks that never make the changelog.

Take chess: GPT-3.5 played badly, GPT-4o was a leap ahead — not from a rules change, but because 4o's training data held far more chess games. That kind of capability shift never appears in the release notes. ModelPort's research surfaces these advertised and unadvertised behaviors — from observed runs, prior migrations, web research, and provider changelogs — and folds the matching prompt and tooling changes into the migration, so the new model is actually used to its strengths instead of just dropped in.

Outcomes: with vs. without ModelPort

Skipping the discipline is exactly where migrations look fine in review and then silently regress in production — broken tool calls, drifted output contracts, no proof, and no way back.

╭───────────────────────────────────┬───────────────────────────────────╮
│  WITHOUT ModelPort                │  WITH ModelPort                   │
├───────────────────────────────────┼───────────────────────────────────┤
│   x  find/replace the model ID    │   ✓  scope-locked discovery       │
│   x  tool calls break silently    │   ✓  callers mapped, not guessed  │
│   x  parser / output drift        │   ✓  output contract preserved    │
│   x  prompt hacks carry over      │   ✓  fixes vs tuning kept apart   │
│   x  "looks fine" — no proof      │   ✓  tests + live smoke evidence  │
│   x  no way back on breakage      │   ✓  rollback notes included      │
├───────────────────────────────────┼───────────────────────────────────┤
│   →  silent prod regressions      │   →  behavior-preserving upgrade  │
╰───────────────────────────────────┴───────────────────────────────────╯

Benchmark the upgrade — built-in evals

Migrations shouldn't be a leap of faith. Opt in and ModelPort ends with measured evidence, not vibes — it runs the same eval set against three configurations so the raw model delta and the skill's added value are attributed separately:

  • Baseline — old model + old prompts (where you started)
  • Naive swap — new model + old prompts (what a find/replace would get you)
  • ModelPort-enhanced — new model + skill-enhanced system (where you land)

A raw swap tends to regress exactly the things that don't survive a model change on their own — output contracts, tool calls, and parse-able format — because newer models follow instructions more literally and re-tokenize differently. The enhanced arm is where the skill's fixes earn their keep. Illustrative leaderboard (your run produces the actual measured numbers):

╭──────────────────────┬────────────┬────────────┬────────────╮
│ metric               │ baseline   │ naive swap │ ModelPort  │
│                      │ (old/old)  │ (new/old)  │ (new/enh.) │
├──────────────────────┼────────────┼────────────┼────────────┤
│  task success        │        82% │        84% │        91% │
│  output contract     │        93% │        89% │        98% │
│  tool-call accuracy  │        90% │        83% │        96% │
│  p95 latency         │       3.9s │       2.4s │       2.6s │
│  cost / req          │     $0.052 │     $0.047 │     $0.045 │
│  refusal / halluc.   │       2.8% │       2.1% │       1.4% │
├──────────────────────┼────────────┼────────────┼────────────┤
│  composite (50/30/20)│       0.80 │       0.83 │       0.91 │
╰──────────────────────┴────────────┴────────────┴────────────╯

Numbers above are illustrative, not measured results — latency and cost in particular move with the target model and are always reported as measured, never assumed. Methodology, metric definitions, and composite scoring live in references/benchmarking.md.

Run it yourself. A bundled harness turns this into a closed loop: python3 harness/run.py scores the three arms on an eval set, and harness/iterate.py sweeps prompt revisions so you watch the score climb until it plateaus. Two scenarios ship (support triage + multi-tool routing); it runs offline by default, or --provider anthropic for measured numbers. See harness/README.md.

Why ModelPort

  • Replace deprecated model IDs without breaking runtime calls.
  • Preserve output contracts, parser behavior, and tool calling.
  • Separate required compatibility fixes from optional prompt tuning.
  • Choose direct swap, adapter, shadow mode, canary, or phased rollout.
  • Produce validation evidence and rollback notes.
  • Benchmark pre vs. post (and raw-swap control) to quantify the upgrade.

Quickstart

Install once, use everywhere — the universal skills CLI wires ModelPort into Codex, Claude Code, Cursor, and more:

npx skills add forkadarshp/MPort

Installing for every agent on your machine? Add --agent '*':

npx skills add forkadarshp/MPort --agent '*'

Manual Installation (Advanced)

If you prefer to fork and modify the skill locally:

Clone:

git clone https://github.com/forkadarshp/MPort.git
cd MPort

Install as a local skill with a symlink (Codex example):

mkdir -p ~/.codex/skills
ln -s "$(pwd)" ~/.codex/skills/modelport-skill

Or copy directly:

mkdir -p ~/.codex/skills/modelport-skill
cp -R . ~/.codex/skills/modelport-skill

Then ask Codex:

Use $modelport-skill and the starter prompt template to migrate
services/assistant from Model A to Model B. Keep tool behavior stable, separate
required fixes from optional tuning, and include validation evidence plus
rollback notes.

For detailed migration requests, start from templates/migration-starter-prompt.md.

What it migrates

Area Migration support
API callers Model IDs, request parameters, response contracts, SDK usage
Prompts System prompts, templates, output contracts, brittle prompt hacks
Agents Tool policy, orchestration behavior, autonomy, handoff instructions
Tools Schemas, descriptions, validation rules, guardrails
Config Model registries, routers, feature flags, rollout controls
Tests and docs Fixtures, assertions, examples, migration notes
Benchmarks Optional pre/post stats: quality, latency, cost, contract drift

Built for real migrations

  • Scope lock before edits to avoid accidental repo-wide churn.
  • Discovery pass across API calls, prompts, agents, tools, tests, and docs.
  • File classification before modification.
  • Migration-pattern selection across direct swap, adapters, shadow mode, canary, and expand-contract changes.
  • Required fixes and optional tuning reported separately.
  • Explicit coordination checks for work another agent has already started.
  • Validation-first completion criteria with tests, smoke checks, and rollback notes.

Example migration prompts

  • "Migrate all model usage in src/agents/ from our old model config to the new release, keep tool behavior stable, and give me a rollback plan."
  • "Upgrade this codebase from provider A to provider B and adjust prompts so output format remains identical."
  • "Replace deprecated model parameters in our API callers and verify tests still pass."
  • "Do a phased migration in services/search with side-by-side support for old and new models."

More examples are in examples/example-prompts.md.

Expected output

The skill guides Codex to produce a migration report with:

  • Migration summary
  • Scope and files changed
  • Existing work and remaining work
  • Required compatibility fixes
  • Prompt and agent behavior tuning
  • Validation results
  • Proof evidence
  • Risks and follow-up recommendations
  • Rollback notes
  • Benchmark results (when benchmarking was requested)

See examples/example-output.md for a sample report.

Repository structure

.
├── .markdownlint-cli2.yaml
├── SKILL.md
├── README.md
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
├── SECURITY.md
├── LICENSE
├── assets/
│   └── social-preview.svg
├── scripts/
│   └── validate_skill.py
├── templates/
│   └── migration-starter-prompt.md
├── references/
│   ├── benchmarking.md
│   ├── migration-patterns.md
│   ├── migration-playbook.md
│   ├── output-style.md
│   ├── prompt-template-research.md
│   ├── provider-checklists.md
│   ├── validation-proof.md
│   └── models/
│       ├── claude-opus-4-8.md
│       ├── claude-opus-4-6.md
│       ├── claude-sonnet-4-5.md
│       └── gpt-5-5.md
├── examples/
│   ├── example-prompts.md
│   └── example-output.md
├── agents/
│   └── openai.yaml
├── docs/
│   └── PUBLISHING.md
└── .github/
    ├── ISSUE_TEMPLATE/
    ├── pull_request_template.md
    └── workflows/ci.yml

Validate locally

python3 scripts/validate_skill.py .
npx markdownlint-cli2 "**/*.md"

Open-source readiness

  • MIT license included.
  • CI validates Markdown and skill structure.
  • Issue templates cover bugs, features, and migration case studies.
  • PR template asks contributors to classify migration relevance.
  • Security reporting process included via GitHub Security Advisories.
  • Skill UI metadata included in agents/openai.yaml.

Roadmap

  • Per-model migration guides for Claude Opus 4.8 / 4.6, Sonnet 4.5, and GPT-5.5 ship today in references/models/. Next: Gemini and self-hosted models.
  • Add regression fixtures for prompt, tool, and structured-output migrations.
  • Ship a runnable benchmark harness to auto-collect the three-arm leaderboard (the methodology lands today in references/benchmarking.md).
  • Add real-world before/after migration case studies.

FAQ

How do I migrate to Claude Opus 4.8?

Install ModelPort, then ask your agent: "migrate this repo to claude-opus-4-8." It reads the Opus 4.8 guide, applies the required compatibility fixes, preserves tool calling and output contracts, and returns validation evidence plus rollback notes.

How do I migrate from Claude Opus 4.7 to 4.8?

4.7 → 4.8 is a drop-in upgrade with no breaking API changes — the main task is re-baselining any tuned effort settings. ModelPort makes the model-ID change, flags the settings to retest, and validates that behavior is unchanged. Coming from 4.6 or earlier, it applies the 4.7 breaking changes first (sampling params, adaptive thinking, tokenizer, removed prefills).

Will changing the model ID break my tool calling or output format?

Often, yes — and that's the problem ModelPort exists to solve. Newer models follow instructions more literally and re-tokenize differently, so a raw find/replace can silently break tool calls and parsers. ModelPort goes beyond the model ID: it updates prompts, tool descriptions, and guardrails to match the target model's actual behavior — researched from observed runs, web research, and provider changelogs, including unadvertised quirks — then validates the contract before finishing.

How do I benchmark Opus 4.8 against my current model?

Opt into benchmarking at the start and ModelPort runs a three-arm comparison — baseline, naive swap, and the enhanced system — on the same eval set, then reports a leaderboard with quality, latency, cost, and contract metrics. See references/benchmarking.md.

Does it work with Claude Code, Codex, and Cursor?

Yes. It's a universal agent skill — install it once and invoke it from Claude Code, Codex, Cursor, or any compatible agent.

Which models and providers does it support?

Any model or provider swap (Anthropic, OpenAI, and others). It ships dedicated migration guides for Claude Opus 4.8 / 4.6, Claude Sonnet 4.5, and GPT-5.5, and a generic playbook for everything else.

Contributing

See CONTRIBUTING.md.

Security

See SECURITY.md.

License

MIT - see LICENSE.

About

Migrate LLM apps between models safely — then benchmark old vs. new to prove the upgrade. Agent skill for behavior-preserving migrations + a runnable eval harness (three-arm leaderboard, prompt sweeps). Works in Claude Code, Codex & Cursor.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages