Universal Agent Plugins & Skills Ecosystem

**Current Scale:** 11 Plugins · 145 Skills · 55 Sub-Agents — a self-improving, cross-platform library of reusable AI agent

capabilities for Claude Code, GitHub Copilot, Gemini CLI, and any compliant agent framework.

Recent milestones: v1.3 — Hardened SQLite control plane (May 2026) · v1.4 — MAF synthesis & hybrid runtime strategy (May 31, 2026) · v1.5 — CLI Agents major update (June 2026)

Architecture Evolution

v1.3 — Hardened Control Plane (May 2026)

Replaced fragile markdown-based state with a transactional SQLite control plane (state_engine.py), added strong process sandboxing (sandbox_runner.py), HMAC-signed envelopes, approval gating, and WAL concurrency safety. Implementation is stdlib-only (sqlite3, hmac, hashlib, subprocess, os, secrets) — no framework dependencies. This made the custom Python kernel production-grade and laid the foundation for the v1.4 hybrid strategy.

v1.4 — MAF Synthesis & Hybrid Strategy (May 31, 2026)

After extensive MAF research and 12 hands-on C# experiments (including full loading of real exploration-cycle-plugin manifests), we pivoted from "do not adopt MAF" to a hybrid architecture:

Manifest-first. Multiple certified runtime adapters second.

Key outcomes:

Kept the hardened Python control plane as the authoritative kernel
Adopted AGT (Agent Governance Toolkit) for deterministic policy enforcement
Ported 4 high-value patterns from MAF: alias resolution, standardized handoff envelopes, per-agent skill scoping, per-phase premium call budgets
MAF is now a certified optional runtime adapter alongside Claude Code, Copilot CLI, and Gemini CLI (ADR-007)
All .md agent manifests and SKILL.md files remain fully portable

This hybrid approach gives us the best of both worlds: battle-tested custom safety primitives + selective leverage of Microsoft's well-engineered patterns.

References: ADR-001 · ADR-002 · ADR-007

v1.5 — CLI Agents Major Update (June 2026)

cli-agents plugin promoted from a basic CLI dispatcher to a full multi-LLM task routing suite with adversarial agent pattern support.

Key outcomes:

run_agent.py task router: 6 backends, argparse v2, --isolated security contract, codex stdin pattern. 76 TDD tests across 3 files.
~2s wall clock for --cli llama direct HTTP to llama-server (measured: 1.977s). 20–30x faster than Mode A proxy path.
11 expert agent personas with structured analytical frameworks: OWASP, C4, SOLID, Big-O, TOGAF-level depth. Adversarial pattern family: red-team-reviewer, debate-synthesizer, output-validator, self-critic.
local-llm-setup skill with scripts/ symlinks: Day 1 bootstrap for macOS Metal / Windows CUDA/Vulkan / Linux CUDA/ROCm.
KV Cache Orchestrator (P0 collision fix): _extract_cache_key() returns None for system-prompt-free requests. 8 new proxy tests.
Plugin manifests (plugin.yaml, plugin.json, marketplace.json) fully corrected and aligned.

Platforms

A strictly cross-platform (Windows, Mac, Ubuntu) library — the universal upstream source for reusable AI agent plugins and skills across multiple IDEs and agent frameworks: Claude Code, GitHub Copilot, Gemini CLI, Antigravity, Roo Code, Windsurf, Cursor, and other compliant integrations.

All plugins deploy to the single .agents/ folder standard — no duplicate copies needed for .github, .gemini, .agent, etc.

Installation

Important

Start here — fresh clone or first-time setup. The single .agents/ environment directory is not committed to your repo. It will be empty by default.

All installation methods (uvx, bootstrap.py, npx skills, and Marketplace / Extension CLI) are now consolidated in a single authoritative guide:

👉 Go to INSTALL.md

Quick install (all plugins):

uvx --from git+https://github.com/richfrem/agent-plugins-skills plugin-add richfrem/agent-plugins-skills

v1.4 note: If upgrading from v1.3, run uv sync (or pip install -r requirements.txt) after pulling latest — the per-phase budget enforcement and AGT governance patterns add new dependencies to exploration-cycle-plugin.

Core Philosophy: Transitional Architectures & Decoupled Skills

This repository is built on a pragmatic acceptance of the current AI engineering landscape: the ecosystem changes weekly, and workflows that were revolutionary six months ago are obsolete today.

Frameworks like agent-agentic-os and spec-kitty are treated as Transitional Architectures — bridges between what agents need to do today and what native SDKs will eventually handle. When Anthropic, Google, and GitHub harden native memory persistence, execution safety, and multi-agent orchestration, large swaths of this tooling will be happily discarded.

The MAF research (May 2026) reinforced this view: instead of choosing between a custom kernel and a framework, we now deliberately pursue a hybrid model:

Portable .md manifests and SKILL.md files remain the source of truth across all runtimes
Multiple runtime adapters (Claude Code, Copilot CLI, Gemini CLI, MAF) are supported side-by-side
Strong custom control plane for safety and governance that no hosted framework currently matches
Selective adoption of excellent patterns from frontier frameworks (e.g. MAF's typed handoffs and AGT governance)

Skills are Applications; the SDK is the OS. Individual skills must function in complete isolation — no hard dependencies on sibling plugins, no assumptions about which framework is running.

Architecture

Pillar 1: The Improvement OS (`agent-agentic-os`)

The OS implements an eval-gated improvement pipeline for autonomous skill evolution:

os-architect           ← intent classifier + ecosystem router
    ↓
os-improvement-loop    ← learning engine: orchestrates multi-iteration improvement
    ↓
os-eval-runner         ← inner gate: KEEP/DISCARD per iteration (evaluate.py)
    ↓
os-eval-backport       ← human gate: review before lab winner → production
    ↓
os-experiment-log      ← scientific backbone: longitudinal tracking + synthesis

Entry point: /os-architect — describe what you want in plain language. The agent classifies intent, audits the ecosystem, proposes Path A/B/C, and dispatches via your available CLI tools. os-evolution-planner writes the task plan + delegation prompt. os-architect-tester validates after any changes.

Karpathy Autoresearch Loop

Skills that score HIGH on the autoresearch viability rubric (objectivity + speed + frequency + utility) can run fully autonomous self-improvement loops:

mutate SKILL.md → evaluate.py → exit 0 (KEEP) or exit 1 (DISCARD) → repeat

Not all skills are good candidates — use eval-autoresearch-fit to score a skill before running a loop.

Live example — convert-mermaid skill, 26 iterations across 2 rounds: 0.61 → 1.00

Each blue diamond is a baseline anchor (one per session). Green = new best score. Amber = kept but not a record. The two-segment shape shows a fresh re-baseline for round 2.

Monitor a live run: python plugins/agent-agentic-os/scripts/plot_eval_progress.py --tsv <lab>/evals/ --live

Flywheel layers:

OUTER flywheel (os-improvement-loop): improves OS-level protocols and session ledgers between sessions
INNER flywheel (os-eval-runner): evaluate.py KEEP/DISCARD gate per iteration within a session

Pillar 2: Execution Patterns (`agent-loops`)

5 composable primitives used as the execution substrate by the Improvement OS and standalone by any agent workflow:

learning-loop · dual-loop · agent-swarm · red-team-review · triple-loop-learning

Pillar 3: Super-RAG 3-Tier Retrieval

O(1) RLM keyword → O(log N) vector semantic → wiki concept nodes.

Super-RAG stack: rlm-factory (O(1) keyword) + vector-db (O(log N) semantic) + obsidian-wiki-engine (full concept nodes)

Each plugin works standalone (Mode A) or combined for full Super-RAG power. Init agents detect what is installed in .agents/skills/ and configure only the available layers.

Hub-and-Spoke ADR

All shared scripts live once at plugins/<plugin>/scripts/. Skills reference them via file-level symlinks (skills/<skill>/scripts/script.py → ../../../scripts/script.py). Directory-level symlinks are forbidden — npx drops them on install.

Plugin Ecosystem (11 plugins · 137 skills)

Group 1: The Improvement OS

agent-agentic-os — Continuous Self-Improvement

The flagship operational framework. Eval-gated improvement loops, memory management, session lifecycle, and ecosystem evolution orchestration.

Skills (17): os-architect · os-evolution-planner · os-guide · os-improvement-loop · os-eval-lab-setup · os-eval-runner · os-eval-backport · os-environment-probe · os-evolution-verifier · os-experiment-log · os-memory-manager · os-improvement-report · os-init · os-clean-locks · todo-check · optimize-agent-instructions · self-evolution

Agents (5): os-architect-agent · os-architect-tester-agent · improvement-intake-agent · os-health-check · agentic-os-setup

Group 2: Engineering Workflows

spec-kitty-plugin — Spec-Driven Development

Enterprise-grade Spec → Plan → Tasks → Implement → Review → Merge pipeline.

Skills (19): spec-kitty-specify · spec-kitty-plan · spec-kitty-tasks · spec-kitty-implement · spec-kitty-review · spec-kitty-merge · spec-kitty-analyze · spec-kitty-accept · spec-kitty-clarify · spec-kitty-research · spec-kitty-dashboard · spec-kitty-status · spec-kitty-checklist · spec-kitty-constitution · spec-kitty-tasks-outline · spec-kitty-tasks-finalize · spec-kitty-tasks-packages · spec-kitty-workflow · spec-kitty-sync-plugin

Agents: spec-kitty-agent · spec-kitty-setup

exploration-cycle-plugin — Discovery & Requirements

Autonomous discovery loop: idea framing → business requirements → user stories → prototype → handoff into formal engineering specs.

Skills (19): exploration-workflow · exploration-session-brief · discovery-planning · business-requirements-capture · business-workflow-doc · user-story-capture · exploration-handoff · exploration-optimizer · prototype-builder · visual-companion · subagent-driven-prototyping · vibe-browser-audit · vibe-behavioral-test-capture · vibe-domain-extractor · vibe-slice-migrator · vibe-reengineer · vibe-spec-packager · vibe-togaf-architect · vibe-to-speckit-superpowers

Agents (17): business-rule-audit-agent · certification-verifier · discovery-planning-agent · domain-purity-auditor · exploration-cycle-orchestrator-agent · handoff-preparer-agent · intake-agent · planning-doc-agent · problem-framing-agent · prototype-builder-agent · prototype-companion-agent · requirements-doc-agent · requirements-scribe-agent · runtime-observer-agent · semantic-drift-auditor · vibe-orchestrator-agent · subagent-driven-prototyping-agent

Group 3: Execution Patterns

agent-loops — Composable Loop Primitives

5 execution primitives used as the substrate for the Improvement OS and standalone agent workflows.

Skills (6): orchestrator · learning-loop · dual-loop · agent-swarm · red-team-review · triple-loop-learning

Agents: orchestrator

Group 4: Code Quality & Safety

agent-scaffolders — Boilerplate & Audit (30 skills)

Interactive creators for exact file hierarchies + structured audit framework for plugin architectural maturity.

Scaffolding skills: create-plugin · create-skill · create-sub-agent · create-command · create-hook · create-github-action · create-agentic-workflow · create-azure-agent · create-docker-skill · create-mcp-integration · create-stateful-skill

Audit & analysis skills: audit-plugin · audit-plugin-l5 · l5-red-team-auditor · analyze-plugin · self-audit · mine-skill · mine-plugins · path-reference-auditor · fix-plugin-paths · synthesize-learnings · eval-autoresearch-fit · manage-marketplace · ecosystem-standards · ecosystem-authoritative-sources

Group 5: CLI Sub-Agents

cli-agents — Multi-LLM Task Router (v2.0.0) — June 2026 Major Update

run_agent.py dispatches bounded tasks to 6 backends. Measured: ~2s wall clock for --cli llama (direct HTTP to llama-server, no proxy, no 29K system prompt overhead).

Skills (12):

local-llm-bridge — --cli llama: direct Gemma 4 12B, ~2s, no proxy
local-llm-setup — cross-platform setup wizard; scripts/ symlinks for Day 1 bootstrap + Mode B config
codex-cli-agent — --cli codex: Codex/OpenAI-compatible, prompt piped via stdin
agy-cli-agent — --cli agy: Antigravity CLI, frontier Gemini models
claude-cli-agent — --cli claude: Claude CLI, Haiku 4.5 default
copilot-cli-agent — --cli copilot: GitHub Copilot CLI, gpt-5-mini ⚠️ AI Credits June 2026
gemini-cli-agent — --cli gemini: Gemini CLI, gemini-3-flash-preview
claude-project-setup · antigravity-project-setup · project-setup · maf-adapter · agt-security

11 Expert Agent Personas (flat agents/ directory, shared across all backends):

Persona	Role	Pattern Family
`refactor-expert`	Code quality — SOLID/DRY smell taxonomy	Code Review
`security-auditor`	OWASP vulnerability audit	Code Review
`architect-review`	C4/SOLID structural review, layer violations	Code Review
`compliance-reviewer`	Coding standards drift detection	Code Review
`pr-reviewer`	Diff review — ship/hold decision	Code Review
`test-writer`	Unit test generation — all path types	Code Review
`performance-analyst`	Bottleneck analysis — Big-O, I/O amplification	Code Review
`red-team-reviewer`	Adversarial exploit analysis, attack surface	Adversarial
`debate-synthesizer`	Dialectical synthesis, conflict resolution	Adversarial
`output-validator`	Output guardrail — hallucination/schema/policy	Adversarial
`self-critic`	Reflection loop — task-fit, completeness check	Adversarial

KV Cache Orchestrator: kv_cache_orchestrator.py — SHA-256 keyed slot save/restore, 4 GiB budget, 31 TDD tests. Proxy integration wired. Eviction scoring inspired by antirez/ds4.

What changed in v2.0.0 (June 2026):

12 duplicate agent files (3 personas × 4 backends) → 11 deep flat personas with OWASP/C4/SOLID analytical frameworks
Added adversarial pattern family: red-team-reviewer, debate-synthesizer, output-validator, self-critic
run_agent.py argparse v2: --cli, --model, --max-tokens, --isolated + legacy positional compat
Security contract: --isolated suppresses --yolo/--dangerously-skip-permissions per backend
Codex stdin: codex exec --model M - (avoids ARG_MAX + process listing exposure)
local-llm-setup skill with scripts/ symlinks for Day 1 bootstrap
plugin.yaml stale skills list corrected (4 non-existent local-llm-bridge-* removed; all 12 real skills listed)

Execution Disciplines — Safety & Quality

Behavioural guardrails enforcing best practices on every coding session. These skills come from obra/superpowers — install that plugin to get them.

Install: uvx --from git+https://github.com/richfrem/agent-plugins-skills plugin-add obra/superpowers

Skills available via superpowers: verification-before-completion · test-driven-development · using-git-worktrees · systematic-debugging · finishing-a-development-branch · requesting-code-review

Group 6: Knowledge & Memory

agent-memory — Unified Cognitive Memory Suite (v1.0.0)

Three standalone plugins consolidated: rlm-factory (O(1) keyword search) + vector-db (semantic search) + memory-management (session tiering). Works standalone per layer or combined as a full Super-RAG stack.

RLM skills (6): rlm-init · rlm-curator · rlm-search · rlm-distill-agent · rlm-cleanup-agent · rlm-audit

Vector DB skills (6): vector-db-init · vector-db-launch · vector-db-ingest · vector-db-search · vector-db-cleanup · vector-db-audit

Session memory (1): memory-management — multi-tiered cognition and context caching

Agents (9): rlm-cleanup-agent · rlm-curator · rlm-distill-agent · rlm-factory-init-agent · rlm-init · rlm-search · vector-db-cleanup · vector-db-ingest · vector-db-init-agent

obsidian-wiki-engine — Karpathy LLM Wiki + Super-RAG (v3.1.0)

Karpathy-style LLM wiki with cross-source concept synthesis. Transforms raw markdown into structured, queryable concept nodes. Full Obsidian vault CRUD, canvas, and graph traversal. Pairs with agent-memory as Phase 3 of the Super-RAG stack.

Wiki skills: obsidian-wiki-builder · obsidian-rlm-distiller · obsidian-query-agent · obsidian-wiki-linter

Vault skills: obsidian-init · obsidian-vault-crud · obsidian-canvas-architect · obsidian-graph-traversal · obsidian-markdown-mastery · obsidian-bases-manager

Setup agents: wiki-init-agent · wiki-build-agent · wiki-distill-agent · wiki-lint-agent · wiki-query-agent · super-rag-setup-agent

Group 7: Infrastructure & Utilities

dev-utils — Developer Utilities Suite (v1.1.0)

Nine standalone plugins consolidated into one. All tools are stateless and self-contained.

Skills (12): adr-management · coding-conventions-agent · context-bundler · convert-mermaid · hf-init · hf-upload · humanize · link-checker-agent · optimize-context · red-team-bundler · symlink-manager · task-agent

Agents (3): coding-conventions-agent · link-checker-agent · rsvp-comprehension-agent

plugin-manager — Ecosystem Sync

Skills (3): plugin-installer · plugin-remover · plugin-syncer

dependency-management — pip-compile Workflows

Cross-platform pip-compile with strict .in → .txt lockfile discipline.

Skills (1): dependency-management

Completed Experiments

Ecosystem Fitness Sweep v1 — COMPLETE (`temp/ecosystem-fitness-sweep-v1/`)

Scored all 116/120 production skills for Karpathy autoresearch loop viability using GPT-5 mini via Copilot CLI. Each skill scored on: objectivity (can a shell command measure it?), execution speed, frequency of use, and potential utility (max 40).

Top HIGH candidates:

Rank	Skill	Score	Loop
1	superpowers/verification-before-completion	35/40	LLM_IN_LOOP
2	superpowers/test-driven-development	35/40	LLM_IN_LOOP
3	coding-conventions/coding-conventions-agent	34/40	HYBRID
4	superpowers/using-git-worktrees	33/40	DETERMINISTIC
5	spec-kitty-plugin/spec-kitty-status	33/40	DETERMINISTIC
6	agent-agentic-os/os-eval-runner	32/40	DETERMINISTIC

Full ranked results: summary-ranked-skills.json Top 20 opportunities with metrics + blockers: autoresearch-opportunities-report.md

Regenerate report:

python plugin-research/experiments/analyze-candidates-for-auto-reseaarch/skills/eval-autoresearch-fit/scripts/update_ranked_skills.py \
  --json-path plugin-research/experiments/analyze-candidates-for-auto-reseaarch/skills/eval-autoresearch-fit/assets/resources/summary-ranked-skills.json \
  --morning-report

Repository Structure

plugins/                    ← upstream source (11 plugins, 137 skills)
  <plugin>/
    plugin.yaml             ← plugin manifest
    .claude-plugin/plugin.json
    skills/<skill>/
      SKILL.md              ← skill definition (mutation target for autoresearch loops)
      evals/evals.json      ← routing evaluation suite (should_trigger boolean schema)
      evals/results.tsv     ← per-experiment score history
      scripts/              ← file-level symlinks → ../../scripts/
    scripts/                ← canonical scripts (shared via symlinks, never duplicated)
    agents/                 ← sub-agent .md definitions
    commands/               ← slash commands
    assets/diagrams/        ← architecture diagrams

.agents/                    ← deployed skill copies (bridge installer output)
  skills/
  agents/

plugin-research/            ← experiments and autoresearch infrastructure
  experiments/
    analyze-candidates-for-auto-reseaarch/

temp/                       ← local scratch (gitignored except scripts)
  ecosystem-fitness-sweep-v1/

137 skills · 11 plugins · Improvement OS (os-architect) · Karpathy autoresearch loops · Super-RAG 3-tier retrieval

Name		Name	Last commit message	Last commit date
Latest commit History 1,361 Commits
.agent/rules		.agent/rules
.antigravitycli		.antigravitycli
.claude-plugin		.claude-plugin
.github		.github
ADRs		ADRs
agent-rules-to-add-when-needed		agent-rules-to-add-when-needed
context		context
docs		docs
plugins		plugins
tasks		tasks
tests		tests
.DS_Store		.DS_Store
.claudeignore		.claudeignore
.copilotignore		.copilotignore
.env.example		.env.example
.gitignore		.gitignore
.nojekyll		.nojekyll
.pyre_configuration		.pyre_configuration
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
INSTALL.md		INSTALL.md
README.md		README.md
__init__.py		__init__.py
_config.yml		_config.yml
bootstrap.py		bootstrap.py
gemini-extension.json		gemini-extension.json
local-plugins-inventory.json		local-plugins-inventory.json
plugin-sources.json		plugin-sources.json
plugin.yaml		plugin.yaml
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
scratch_symlinks_inventory.json		scratch_symlinks_inventory.json
skills-lock.json		skills-lock.json
symlinks.json		symlinks.json

Folders and files

Latest commit

History

Repository files navigation

Universal Agent Plugins & Skills Ecosystem

Architecture Evolution

v1.3 — Hardened Control Plane (May 2026)

v1.4 — MAF Synthesis & Hybrid Strategy (May 31, 2026)

v1.5 — CLI Agents Major Update (June 2026)

Platforms

Installation

👉 Go to INSTALL.md

Core Philosophy: Transitional Architectures & Decoupled Skills

Architecture

Pillar 1: The Improvement OS (agent-agentic-os)

Karpathy Autoresearch Loop

Pillar 2: Execution Patterns (agent-loops)

Pillar 3: Super-RAG 3-Tier Retrieval

Hub-and-Spoke ADR

Plugin Ecosystem (11 plugins · 137 skills)

Group 1: The Improvement OS

agent-agentic-os — Continuous Self-Improvement

Group 2: Engineering Workflows

spec-kitty-plugin — Spec-Driven Development

exploration-cycle-plugin — Discovery & Requirements

Group 3: Execution Patterns

agent-loops — Composable Loop Primitives

Group 4: Code Quality & Safety

agent-scaffolders — Boilerplate & Audit (30 skills)

Group 5: CLI Sub-Agents

cli-agents — Multi-LLM Task Router (v2.0.0) — June 2026 Major Update

Execution Disciplines — Safety & Quality

Group 6: Knowledge & Memory

agent-memory — Unified Cognitive Memory Suite (v1.0.0)

obsidian-wiki-engine — Karpathy LLM Wiki + Super-RAG (v3.1.0)

Group 7: Infrastructure & Utilities

dev-utils — Developer Utilities Suite (v1.1.0)

plugin-manager — Ecosystem Sync

dependency-management — pip-compile Workflows

Completed Experiments

Ecosystem Fitness Sweep v1 — COMPLETE (temp/ecosystem-fitness-sweep-v1/)

Repository Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Pillar 1: The Improvement OS (`agent-agentic-os`)

Pillar 2: Execution Patterns (`agent-loops`)

Ecosystem Fitness Sweep v1 — COMPLETE (`temp/ecosystem-fitness-sweep-v1/`)

Packages