A curated list of AI security resources, tools, research papers, and more.
Focused on LLM security, prompt injection, jailbreaks, AI agents, and RAG systems.
- Research Papers
- Tools
- Articles & Blogs
- Courses & Training
- CTF & Challenges
- Videos & Talks
- Vulnerability Databases
- Companies & Services
- People to Follow
- Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs - Large-scale prompt injection competition analysis
- Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection - Foundational indirect prompt injection paper
- Prompt Injection Attack Against LLM-integrated Applications - Systematic analysis of prompt injection vectors
- Demystifying RCE Vulnerabilities in LLM-Integrated Apps - Remote code execution through LLM exploitation
- Tensor Trust: Interpretable Prompt Injection Attacks - Game-theoretic approach to prompt injection
- Prompt Injection Attacks and Defenses in LLM-Integrated Applications - Comprehensive attack taxonomy
- Ignore Previous Prompt: Attack Techniques For Language Models - Early systematic study of prompt injection
- Universal and Transferable Adversarial Attacks on Aligned Language Models - GCG adversarial suffix attacks
- Jailbroken: How Does LLM Safety Training Fail? - Analysis of jailbreak techniques
- Do Anything Now: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models - Study of real-world jailbreak prompts
- Jailbreaking ChatGPT via Prompt Engineering - Prompt engineering jailbreak techniques
- Multi-step Jailbreaking Privacy Attacks on ChatGPT - Multi-turn jailbreak strategies
- AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models - Automated jailbreak generation
- MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots - Cross-model jailbreaking
- Poisoning Retrieval Corpora by Injecting Adversarial Passages - Corpus poisoning attacks on RAG
- Backdoor Attacks on Dense Passage Retrievers for Disseminating Misinformation - RAG backdoor attacks
- PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation - Systematic RAG poisoning
- Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models - RAG indirect injection defense
- InjectAgent: Indirect Prompt Injection Attacks on Autonomous Agents - Agent-specific injection attacks
- AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents - Agent security benchmark
- ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning - Tool-use security analysis
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents - Agent safety benchmark
- garak - LLM vulnerability scanner with extensive probe library
- PyRIT - Microsoft's Python Risk Identification Toolkit for generative AI
- Promptmap - Automatic prompt injection testing
- LLM-Attacks - GCG adversarial attack implementation
- TextAttack - NLP adversarial attack framework
- Adversarial Robustness Toolbox - IBM's ML security library
- llm-security-payloads - Curated LLM attack payload collection
- NeMo Guardrails - NVIDIA's programmable guardrails for LLMs
- Guardrails AI - Input/output validation for LLMs
- LLM Guard - Security toolkit for LLM interactions
- Rebuff - Prompt injection detection
- Vigil - LLM prompt injection scanner
- Lakera Guard - Commercial prompt injection protection
- AgentAudit - Automated AI security testing platform
- Protect AI - ML/AI security platform
- HiddenLayer - AI security monitoring
- CalypsoAI - LLM security scanning
- OWASP Top 10 for LLM Applications - Essential LLM security reference
- Prompt Injection: What's the worst that can happen? - Simon Willison's prompt injection overview
- The AI Attack Surface Map v1.0 - Daniel Miessler's attack surface taxonomy
- Securing LLM Systems Against Prompt Injection - NVIDIA's defense guide
- Anthropic's Responsible Disclosure Policy - AI safety disclosure practices
- Google's Secure AI Framework (SAIF) - Enterprise AI security framework
- Red Teaming Language Models with Language Models - DeepMind's automated red teaming
- Lessons Learned on LLM Safety - OpenAI GPT-4 system card
- Embrace The Red: LLM Security - Johann Rehberger's AI security blog
- Hacking Auto-GPT and LangChain - Agent exploitation walkthrough
- Jailbreaking GPT-4's Code Interpreter - Code interpreter bypass
- LLM Security: Prompt Injection & Data Exfiltration - Cobalt's security analysis
- The Dual LLM Pattern for Building AI Assistants - Architectural defense pattern
- NVIDIA: Securing LLM Applications - Free LLM security course
- Lakera Prompt Injection Course - Free prompt injection fundamentals
- HackAPrompt Competition - Learn by competing
- Damn Vulnerable LLM Agent - Hands-on vulnerable agent
- SANS SEC595: Applied Data Science and AI/ML for Cybersecurity - Comprehensive AI security
- Offensive AI (OffSec) - Offensive security with AI focus
- AI Red Team Professional - AI red teaming certification
- Gandalf by Lakera - Progressive prompt injection challenge
- GPT Prompt Attack - Prompt injection CTF
- Prompt Airlines - Interactive jailbreak game
- HackAPrompt - Large-scale prompt injection competition
- TensorTrust - PvP prompt injection game
- Crucible by Dreadnode - AI security CTF platform
- AI Village CTF - DEF CON AI security challenges
- Prompt Injection Playground - Practice environment
- DEF CON 31 - Compromising LLMs: The Advent of AI Malware - AI malware and exploitation
- Black Hat 2023 - Hacking AI: Security Implications of ML Models - ML model security
- DEF CON 31 AI Village - Indirect Prompt Injection - Kai Greshake on indirect injection
- BSides SF 2024 - LLM Security Deep Dive - Latest LLM security talks
- John Hammond - ChatGPT Jailbreaks - Popular jailbreak demos
- LiveOverflow - Hacking AI - Technical AI exploitation
- David Bombal - AI Security - AI security interviews
- NVIDIA AI Enterprise - LLM Security - Enterprise LLM security
- AI Incident Database - Real-world AI failure database
- AVID (AI Vulnerability Database) - ML vulnerability taxonomy
- MITRE ATLAS - Adversarial Threat Landscape for AI Systems
- NIST AI Risk Management Framework - AI risk standards
- CVE - AI Related - Traditional CVEs for AI systems
- XSource_Sec - AI red teaming and AgentAudit platform
- Lakera - Prompt injection protection
- Protect AI - ML security platform
- HiddenLayer - AI threat detection
- CalypsoAI - LLM security scanning
- Robust Intelligence - AI validation platform
- Adversa AI - AI red teaming
- Preamble - AI guardrails
- Anthropic Safety - Constitutional AI research
- OpenAI Red Teaming - GPT safety and red teaming
- Google DeepMind Safety - AI safety research
- Microsoft Responsible AI - Azure AI security
| Name | Handle | Focus |
|---|---|---|
| Simon Willison | @simonw | Prompt injection research |
| Johann Rehberger | @waborel | AI red teaming |
| Kai Greshake | @kai_greshake | Indirect prompt injection |
| Daniel Miessler | @danielmiessler | AI security frameworks |
| Sander Schulhoff | @SSchulhworthy | HackAPrompt organizer |
| Rich Harang | @richharang | NVIDIA AI security |
| Pliny the Prompter | @elder_plinius | Jailbreak research |
| Jailbreak Chat | @jailbreakchat | Jailbreak aggregation |
Contributions welcome! Please read the Contributing Guide first.
- Add new resources via Pull Request
- Ensure links are working and relevant
- Follow the existing format
Maintained by XSource_Sec
If you find this useful, please β star the repository!