agent runaway vs prompt injection
security sees an LLM integration misbehaving. runaway is autonomous re-planning by an agent that received a benign prompt — failure is in the tool-call chain. prompt injection is adversarial user/document/retrieved input bending the model — failure is in the input stream. wrong call sends you to MCP trace reconstruction when you need attempt-log pattern matching, or vice versa.
primary tools · side by side
ordered entry points from the case-type taxonomy. highlighted rows appear in both case types' editorial tool lists.
AI agent runaway action
an autonomous agent (Claude · GPT · Gemini · Copilot · custom MCP) takes actions outside its prompt scope — reads credentials it shouldn't, exfils data, installs persistence, calls an MCP tool a human wouldn't have approved. evidence is the tool-call trace, the prompt-action divergence, and the OAuth grant ledger — not the prompt itself. distinct from llm-prompt-injection (malicious input) and insider-threat (human actor).
- 01ai agent tool call execution trace reconstructordrop agent run log · reconstruct tool-call sequence + state mutations · runs locally
- 02ai agent prompt vs action divergence detectordrop agent run log · detect actions taken inconsistent with prompt · runs locally
- 03ai agent autonomous action accountability tracerdrop agent run log · trace responsibility for each autonomous action · runs locally
- 04ai agent credential handling auditdrop agent run log · audit credential usage + leakage risk · runs locally
- 05mcp tool call graph reconstructordrop mcp client + server log set · reconstruct tool-call dependency graph · runs locally
- 06ai agent persistence mechanism detectordrop agent + system state · detect persistence implanted by agent · runs locally
- 07ai agent network exfiltration pattern detectordrop agent network log · detect data exfiltration via agent · runs locally
LLM prompt injection
adversarial input — user prompt, retrieved doc, MCP tool result, uploaded attachment — manipulates an LLM into ignoring its system prompt or executing unintended actions. evidence is the attempt log, the matched pattern cluster, the indirect-injection carrier artifact, and the guardrail bypass score. distinct from ai-agent-runaway (autonomous scope creep with a benign prompt) and insider-threat (human actor with no model in the path).
- 01llm prompt injection attempt log forensic analyzerdrop llm api/chat injection log export · parse user turn + matched pattern + model response · runs locally
- 02prompt injection attempt detector in uploaded docdrop pdf / docx / image · detect known prompt-injection payload patterns · runs locally
- 03indirect prompt injection document artifact detectordrop uploaded doc + chat export · detect hidden instruction payloads in attachments · runs locally
- 04mcp prompt injection via tool result detectordrop mcp server tool result log · detect injection payloads in tool responses · runs locally
- 05rag prompt injection via retrieved doc detectordrop retrieved docs · detect injection payloads in retrievals · runs locally
- 06llm jailbreak conversation artifact detectorscan conversation exports for dan · roleplay bypass · injection patterns · severity · export csv · runs locally
- 07llm guardrail bypass score anomaly detectordrop safety classifier log export · detect score manipulation + threshold edge cases · runs locally
editorial overlap
lean toward…
disambiguation signals derived from case-type descriptions and common practitioner confusion points.
lean toward agent runaway if you see…
- autonomous tool-call chain with no matched jailbreak pattern or adversarial template in user-turn logs — the original prompt was bounded
- prompt-vs-action divergence on agent steps where stated_intent was benign but actual_action was out-of-scope (exfil · persistence · credential read)
- MCP tool-call graph or agent persistence (cron · webhook) added while the deploying operator was offline — no model-input manipulation in the path
lean toward prompt injection if you see…
- matched injection pattern, jailbreak template cluster, or adversarial turn sequence in LLM attempt logs driving model output
- indirect-injection carrier artifact — uploaded doc · retrieved RAG chunk · MCP tool result — containing imperative override text that the model then followed
- guardrail bypass score anomaly, system-prompt exfil attempt, or red-team evaluation log showing the input was the attack vector — not autonomous replanning