LLM prompt injection
adversarial input — user prompt, retrieved doc, MCP tool result, uploaded attachment — manipulates an LLM into ignoring its system prompt or executing unintended actions. evidence is the attempt log, the matched pattern cluster, the indirect-injection carrier artifact, and the guardrail bypass score. distinct from ai-agent-runaway (autonomous scope creep with a benign prompt) and insider-threat (human actor with no model in the path).
a guided path, not automation — each step opens a tool you run yourself; nothing uploads. progress is saved only in this browser.
wraps the LLM prompt injection — attempt log kit preset — drop injection attempt logs + chat exports → parse attempts → jailbreak cluster → guardrail bypass → report
guided steps
- llm prompt injection attempt log forensic analyzer
parse user turn + matched pattern + model response rows
- llm jailbreak conversation artifact detector
detect jailbreak conversation artifacts in chat exports
- chatbot jailbreak pattern cluster detector
cluster jailbreak templates + success rate across sessions
- llm guardrail bypass score anomaly detector
detect guardrail score manipulation + threshold edge cases
- case report generator
draft report linking injection attempts to guardrail bypass events
suggested options · title: LLM prompt injection — attempt log assessment
when you're done
export a run summary — a small JSON record of which steps you marked done, your notes, and a self-hash so the record can't be silently altered. it is your reproducibility note, not a per-tool receipt: each tool emits its own input→output receipt when you run it.