apex-prompt-injection — indirect injection via RAG-retrieved document
Apex Financial assistant 'FinBot' — a PDF filed in the knowledge base contained an indirect injection payload in its metadata that overrode the system prompt and exfiltrated the session context to an attacker-controlled webhook. Seven injection attempts logged across 3 user sessions. Fully synthetic.
what this proves
- every primary engine produces deterministic, fixture-locked output — verified by
npm run check:flagship(7/7). - every output is generated 100% locally in your browser — no upload, no server-side processing of your evidence.
- the full case binder is built from these outputs without uploading a single byte — click below to generate it locally.
primary engines locked to this fixture
- llm-prompt-injection-attempt-log-forensic-analyzer
- prompt-injection-attempt-detector-in-uploaded-doc
- indirect-prompt-injection-document-artifact-detector
- mcp-prompt-injection-via-tool-result-detector
- rag-prompt-injection-via-retrieved-doc-detector
- llm-jailbreak-conversation-artifact-detector
- llm-guardrail-bypass-score-anomaly-detector
build the case binder
one click runs all primary engines on the synthetic evidence, assembles findings into a self-contained html binder, and opens it in a new tab. print to pdf from there — still zero upload.
runs all 8 primary engines locally on the synthetic evidence zip · opens a self-contained html binder · no upload
download the synthetic evidence
MIT-licensed, fully synthetic, safe to attach to a PR or send to a reviewer. Compare your local runs against the published goldens.
built deterministically from scripts/fixtures/build-apex-prompt-injection.mjs. seed: apex-prompt-injection:v1.
methodology
indirect injection cases require you to find the carrier artifact before you analyze the attempt log. the PDF metadata payload is the injection vector; the RAG retrieval log is the delivery mechanism. start with the attempt log analyzer to cluster the seven attempts by session, then trace each cluster back to the retrieved document that carried the payload. the guardrail bypass score closes the loop — if the score is non-zero, the system prompt override succeeded at least once. read the full LLM prompt injection guide →
after the playbook
export findings from each primary engine, then drop every csv/json into fatcousin-multi-tool-super-timeline-correlator. one timeline across document ingestion, injection attempts, and exfil webhook calls — still zero upload.