agent-safety

Star

Here are 168 public repositories matching this topic...

wuyoscar / ISC-Bench

Star

Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.

benchmark jailbreak ai-safety red-teaming large-language-models llm-safety safety-evaluation agent-safety

Updated Apr 27, 2026
Python

XSafeAI / XSafeClaw

Star

Introducing XSafeClaw: The Open-Source Agent Safety Platform from Fudan University

ai-safety red-teaming prompt-injection llm-security agentic-ai agent-safety openclaw safe-claw

Updated May 2, 2026
Python

kajogo777 / the-agent-sandbox-taxonomy

Star

An open taxonomy and scoring framework for evaluating AI agent sandboxes: 7 defense layers, 7 threat categories, 3 evaluation dimensions, 27 "sandboxes" scored.

security devops taxonomy sandbox threat-modeling ai-agents container-security microvm defense-in-depth infrastructure-security llm-agents agent-safety scoring-framework compute-isolation

Updated Apr 14, 2026
Go

CodeAlive-AI / ai-driven-development

Star

Practices, protocols, and skills for AI-driven software development. 18 skills + 1 Bash safety hook for Claude Code, Codex CLI, OpenCode, Cursor, Gemini CLI, Antigravity, and any agent supporting the Agent Skills standard.

Updated May 4, 2026
Python

corv89 / shannot

Star

Human-in-the-loop execution for LLM agents

python linux cli security devops automation mcp sandbox sysadmin python3 developer-tools human-in-the-loop llm llm-agents agent-safety supervised-execution

Updated Apr 13, 2026
Python

bridge-mind / BridgeWard

Star

Trust nothing. Ship safely. — Skeptical-reading and prompt-injection defense skill for AI agents. Provenance tagging, red-flag patterns, refusal templates, and a read-only injection auditor. MIT.

plugin skill mcp ai-agents ai-security prompt-injection llm-security vibe-coding claude-code mcp-security agent-safety claude-code-skill bridgemind

Updated Apr 30, 2026
Shell

AgentSafe-AI / tooltrust-scanner

Star

Security scanner for AI agent tool definitions

golang mcp gateway security-scanner ai-security supply-chain-security prompt-injection ai-security-tool model-context-protocol mcp-server mcp-tools agent-safety

Updated May 4, 2026
Go

schmitthub / clawker

Star

Claude Code agent-in-container orchestration and automation

go docker golang ai containerization ai-agents claude agent-container ai-agent agent-sandbox llm claude-code ai-sandboxes agent-safety ai-container ai-sandbox agent-sandboxes agent-containment

Updated May 5, 2026
Go

fpytloun / intaris

Star

Guardrails service for AI agents. Default-deny tool call evaluation with LLM safety analysis, priority-ordered decision matrix, and human-in-the-loop escalations. Session recording, behavioral analysis, MCP proxy, secret redaction, and real-time audit.

automation mcp human-in-the-loop policy-engine ai-agents guardrails tool-use agent-safety

Updated May 4, 2026
Python

aerosta / rewardhackwatch

Star

Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1 on 5,391 trajectories).

Updated Apr 27, 2026
Python

Pro-GenAI / Agent-Action-Guard

Star

🛡️ Safe AI Agents through Action Classifier

Updated May 5, 2026
Python

jetywolf / claw-diary

Star

Audit log + guard for AI agents. Passive logging, human-in-the-loop approval for dangerous ops (rm, drop, transfer) via Telegram. Diary, daily digest, timeline UI. Cursor & MCP ready. Cloudflare Workers + Hono + D1.

typescript telegram mcp guard audit d1 hono human-in-the-loop ai-agents cloudflare-workers durable-objects agent-safety

Updated Mar 9, 2026
TypeScript

oxdeai / oxdeai

Star

Deterministic execution authorization for AI agents

distributed-systems protocol policy-engine autonomous-systems policy-enforcement llm-agents deterministic-systems ai-infrastructure infrastructure-ai agent-runtime agent-safety authorization-protocol runtime-guardrails policy-engine-devops-security execution-authorization

Updated May 4, 2026
TypeScript

oathe-ai / otc

Star

Open Threat Classification (OTC) — 10 threat patterns for AI agent skills, MCP servers, and plugins. CC-BY-4.0.

ai-security behavioral-analysis mcp-security agent-safety threat-taxonomy

Updated Feb 26, 2026

MSApps-Mobile / claude-plugins

Star

29 free, open-source plugins for Claude Code & Cowork — Google Drive, WhatsApp, YouTube, WordPress, Apollo & more. Built on the SOSA™ security framework.

mcp autonomous-agents ai-agents claude ai-security cowork sosa ai-tools ai-automation ai-governance anthropic llm-tools model-context-protocol claude-code agent-safety plugin-marketplace claude-plugins sosa-agents

Updated May 5, 2026
TypeScript

choihyunsus / n2-ark

Sponsor

Star

Deterministic Guardrails for AI Agents. Ark acts as a logic-based firewall, preventing unauthorized actions through a rigorous rule engine. Ensure your AI behaves exactly as intended.

rule-engine mcp firewall ai-security guardrails agent-safety deterministic-logic

Updated Apr 3, 2026
TypeScript

shcherbak-ai / tethered

Star

tethered — Runtime network egress control for Python. One function call to restrict which hosts your code can connect to.

security egress-filtering network-security devsecops egress supply-chain-security llms agent-safety

Updated May 2, 2026
Python

SafellmHub / hguard-go

Star

Guardrails for LLMs: detect and block hallucinated tool calls to improve safety and reliability.

middleware machine-learning ai language-models ai-safety prompt-engineering llms toolformer hallucination-detection tool-calling agent-safety

Updated Jul 18, 2025
Go

runcycles / cycles-openclaw-budget-guard

Star

Cycles budget and action guard for OpenClaw agents

multi-tenant budget cost-control ai-agents finops budget-control llm ai-governance llmops action-control agentic-ai agent-runtime agent-safety agent-governance openclaw openclaw-plugin runtime-authority

Updated May 3, 2026
TypeScript

agentralabs / agentic-contract

Star

Policy engine for AI agents — enforceable rules, risk limits, approval gates, obligation tracking, and violation detection. One .acon file. Rust core + MCP server.

python rust mcp contract cursor governance risk-management policy-engine ai-agents claude binary-format agentic model-context-protocol agent-safety

Updated Mar 14, 2026
Rust

Improve this page

Add a description, image, and links to the agent-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the agent-safety topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent-safety

Here are 168 public repositories matching this topic...

wuyoscar / ISC-Bench

XSafeAI / XSafeClaw

kajogo777 / the-agent-sandbox-taxonomy

CodeAlive-AI / ai-driven-development

corv89 / shannot

bridge-mind / BridgeWard

AgentSafe-AI / tooltrust-scanner

schmitthub / clawker

fpytloun / intaris

aerosta / rewardhackwatch

Pro-GenAI / Agent-Action-Guard

jetywolf / claw-diary

oxdeai / oxdeai

oathe-ai / otc

MSApps-Mobile / claude-plugins

choihyunsus / n2-ark

shcherbak-ai / tethered

SafellmHub / hguard-go

runcycles / cycles-openclaw-budget-guard

agentralabs / agentic-contract

Improve this page

Add this topic to your repo