awesome-prompt-injection

github.com/Joe-B-Security/awesome-prompt-injection ↗

Learn about a type of vulnerability that specifically targets machine learning models

567

GitHub Stars

Curated Resources

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me articles and blog posts resources from awesome-prompt-injection"

Installation instructions →

What's inside

Articles and Blog posts

Adversarial Prompting
A guide on the various types of adversarial prompting and ways to mitigate them.
ChatGPT Plugins: Data Exfiltration via Images & Cross Plugin Request Forgery
This post shows how a malicious website can take control of a ChatGPT chat session and exfiltrate the history of the conversation.
Continuously Hardening ChatGPT Atlas Against Prompt Injection Attacks
OpenAI's Dec 2025 disclosure of a real attack chain (malicious email → agent sends resignation letter) and the RL-trained automated attacker they built to find new injection classes before external adversaries do. OpenAI explicitly states deterministic guarantees are not achievable.
Design Patterns for Securing LLM Agents against Prompt Injections
Overview of various strategies to mitigate the risk of prompt injection
Don't you (forget NLP): Prompt injection with control characters in ChatGPT
A look into how to achieve prompt injection from control characters from Dropbox.
How Microsoft Defends Against Indirect Prompt Injection Attacks
Microsoft MSRC's Jul 2025 post on FIDES, an information-flow control system enforcing privilege separation and prompt isolation to deterministically block IPI in Copilot-class agents.

Introduction Resources

Agents Rule of Two: A Practical Approach to AI Agent Security
Meta's Oct 2025 framework stating that agents must satisfy no more than two of: (A) processing untrustworthy inputs, (B) access to sensitive data, (C) ability to change state externally — a deterministic architectural approach to bounding blast radius.
Prompt Injection in 2026: Why the Attack Surface Keeps Growing
Feb 2026 synthesis explaining why the problem is structural, not solvable by filters: vendors face a direct tradeoff between blocking injections and preserving functionality, and covers the Morris II AI worm as a concrete proof of super-linear propagation.

Tools

Agent Threat Rules (ATR)
Open detection standard for AI agent threats (prompt injection, tool poisoning, MCP attacks, skill compromise) — Sigma/YARA-style YAML rules. 330 rules across 9 attack categories with full mapping to OWASP Agentic Top 10 (10/10), MITRE ATLAS (100/113), NIST AI RMF (100%), and SAFE-MCP (78/85). 97.1% recall on the garak probe set (193 probes) and 0% false-positive on 53,577 real-world MCP skills. Shipped in production at Cisco AI Defense and Microsoft agent-governance-toolkit. Apache-2.0.
Augustus
Feb 2026 open-source tool from Praetorian. A single Go binary with 210+ vulnerability probes across 47 attack categories, 28 LLM providers, 90+ detectors, and 7 payload transformation buffs. Built for penetration testing workflows without Python/npm dependencies.
brood-box
Hardware-isolated microVM sandbox for running coding agents (Claude Code, Codex, OpenCode) with workspace snapshot isolation, DNS-aware egress control, and MCP authorization profiles to contain damage from prompt injection attacks.
Garak
Automate looking for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses in LLM's.
InjecGuard
Open-source prompt guard with published training data; achieves +30.8% over prior state-of-the-art on the NotInject benchmark, specifically addressing overdefense false positives that break legitimate use cases.
OWASP Agent Memory Guard

CTF

AI/LLM Exploitation Challenges
AI, ML, and LLMs CTF Challenges.
ai-prompt-ctf by c-goosen
One of the few CTFs that tests indirect injection against tool-calling agents, spanning RAG, function calling, and ReAct agent scenarios using LlamaIndex, ChromaDB, GPT-4o, and Llama 3.2.
CrowdStrike AI Unlocked
Released Feb 2026, designed to train security, developer, and AI teams on prompt injection against increasingly capable agents. Built by CrowdStrike's Counter Adversary Operations team.
Damn Vulnerable LLM Agent
A sample chatbot powered by a ReAct agent, implemented with Langchain. It's designed to be an educational tool for security researchers, developers, and enthusiasts to understand and experiment with prompt injection attacks in ReAct agents.
Gandalf
Your goal is to make Gandalf reveal the secret password for each level. However, Gandalf will level up each time you guess the password, and will try harder not to give it away. Can you beat level 7? (There is a bonus level 8).
PromptTrace
Free AI security training platform with 7 hands-on prompt injection labs and a 15-level CTF (the Gauntlet) with progressively harder defenses — from prompt-level rules to code guards to LLM classifiers. Unique feature: Context Trace shows the full prompt stack (system prompt, RAG documents, tool definitions, user input) in real-time so you can see exactly how attacks work. Uses real LLMs from OpenAI, Anthropic, Google, Groq, and Cerebras.

Tutorials

AI Read Teaming from Google
Google's red team walkthrough of hacking AI systems.
How AI Prompt Injection Works | Hands-on with LLMs
Jan 2026 AppSecEngineer tutorial with a code-level demo of injecting against a real LLM application and live testing of LLM Guard detection. One of the most practical end-to-end tutorials published to date.
MCP Prompt Injection: How AI Gets Hacked
Nov 2025 hands-on walkthrough showing how prompt injection exploits tool metadata and trust boundaries in Model Context Protocol-integrated agents — the dominant new attack surface of 2025.
Prompt Injection
Prompt Injection tutorial from Learn Prompting.
Prompt Injection in LLM Agents (ReAct, Langchain)
Theory and hands-on lab on prompt injection against Langchain ReAct agents

Research Papers

Attention Tracker: Detecting Prompt Injection Attacks in LLMs
NAACL 2025 Findings paper detecting prompt injection by tracking attention distribution shifts — no modification to the underlying model required, making it deployable as a wrapper on any LLM.
Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models
Discovers that text embedding models have severely biased output distributions, and exploits this to find universal adversarial suffixes ("magic words") that bypass embedding-based LLM safeguards. Attacks transfer across models and languages; a train-free debiasing defense is also proposed.
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
This paper explores the concept of Indirect Prompt Injection attacks on Large Language Models (LLMs) through their integration with various applications. It identifies significant security risks, including remote data theft and ecosystem contamination, present in both real-world and synthetic applications.
Prompt Injection 2.0: Hybrid AI Threats
Jul 2025 paper showing how prompt injections now combine with XSS, CSRF, AI worm propagation, and multi-agent infections to evade traditional WAFs entirely. Evaluates Preamble's classifier, data-tagging, and RL-based defenses against these hybrid scenarios.
Safety in Embodied AI: Risks, Attacks, and Defenses
A comprehensive survey of 500+ papers covering prompt injection and other attack vectors in embodied AI systems across the full pipeline (perception, cognition, planning, action, agentic). Includes a 5-layer threat taxonomy mapping where new capabilities introduce new attack surfaces.
Securing AI Agents Against Prompt Injection Attacks
Nov 2025 benchmark of 847 adversarial test cases across 5 attack categories against 7 LLMs. The combined defense framework reduces attack success from 73.2% to 8.7% while retaining 94.3% of baseline task performance.

Community

Learn Prompting
Discord server from Learn Prompting.
MITRE ATLAS
MITRE's adversarial ML threat matrix formally cataloguing direct and indirect prompt injection as core adversary techniques, enabling integration into enterprise threat modelling and purple team exercises.
OWASP Gen AI Security Project
The authoritative standards body maintaining prompt injection as LLM Risk #1, with continuously updated attack patterns, mitigations, and real-world scenarios contributed by practitioners across the industry.
r/llmsecurity
The most active subreddit dedicated to LLM security research; a good early-warning channel for real-world incidents and new disclosures.
Simon Willison's Blog
The most consistent independent tracker of real-world prompt injection incidents, new papers, and tooling across the field.

Showing a sample of 52 resources. View the full list on GitHub →