mcp server compromise vs prompt injection
an LLM produced an unexpected response or tool call. prompt injection is adversarial input — user prompt · RAG chunk · uploaded attachment — bending an honest model running against an honest server. server compromise is a malicious MCP server feeding tampered tool results or definitions back to a clean model; the model is faithful, the input is honest, but the server pipeline lies. attempt-log pattern matching solves one; server audit-log diffing solves the other.
primary tools · side by side
ordered entry points from the case-type taxonomy. highlighted rows appear in both case types' editorial tool lists.
MCP server compromise
the MCP (Model Context Protocol) server itself is the failure locus — leaked server credentials, impersonated server identity, server-side tool-definition tampering, or permission escalation in the server's tool-grant ledger. evidence is the server audit log, the client-invocation trail showing what the LLM thinks it called vs what the server actually executed, the tool-call attribution graph, and the OAuth scope grant ledger. distinct from ai-agent-runaway (agent did this with a benign server) and llm-prompt-injection (input bent the model · server was clean). a compromised server can fool both honest models and honest agents.
- 01mcp model context protocol server audit log forensic analyzerdrop mcp server audit log · parse tool calls + resource accesses + auth · runs locally
- 02mcp client invocation log forensic analyzerdrop mcp client invocation log · parse server calls + arguments + responses · runs locally
- 03mcp server permission escalation detectordrop mcp server audit log · detect over-permissioned tool exposure · runs locally
- 04mcp tool call graph reconstructordrop mcp client + server log set · reconstruct tool-call dependency graph · runs locally
- 05mcp prompt injection via tool result detectordrop mcp server tool result log · detect injection payloads in tool responses · runs locally
- 06anthropic mcp claude tool call attribution tooldrop claude tool call log · attribute each tool call to model decision · runs locally
LLM prompt injection
adversarial input — user prompt, retrieved doc, MCP tool result, uploaded attachment — manipulates an LLM into ignoring its system prompt or executing unintended actions. evidence is the attempt log, the matched pattern cluster, the indirect-injection carrier artifact, and the guardrail bypass score. distinct from ai-agent-runaway (autonomous scope creep with a benign prompt) and insider-threat (human actor with no model in the path).
- 01llm prompt injection attempt log forensic analyzerdrop llm api/chat injection log export · parse user turn + matched pattern + model response · runs locally
- 02prompt injection attempt detector in uploaded docdrop pdf / docx / image · detect known prompt-injection payload patterns · runs locally
- 03indirect prompt injection document artifact detectordrop uploaded doc + chat export · detect hidden instruction payloads in attachments · runs locally
- 04mcp prompt injection via tool result detectordrop mcp server tool result log · detect injection payloads in tool responses · runs locally
- 05rag prompt injection via retrieved doc detectordrop retrieved docs · detect injection payloads in retrievals · runs locally
- 06llm jailbreak conversation artifact detectorscan conversation exports for dan · roleplay bypass · injection patterns · severity · export csv · runs locally
- 07llm guardrail bypass score anomaly detectordrop safety classifier log export · detect score manipulation + threshold edge cases · runs locally
editorial overlap
lean toward…
disambiguation signals derived from case-type descriptions and common practitioner confusion points.
lean toward mcp server compromise if you see…
- MCP server-side tool-result tampering — the result returned from the server contains content the underlying data source did not produce · server log shows tool-result rewrite between data layer and client response
- server tool-definition drift or capability-list change between client sessions without an attributable admin event — model received different tool semantics on different calls
- OAuth scope grant ledger on the server shows escalation or token-handler edit · server impersonation signal in TLS pinning or server-identity attestation logs
lean toward prompt injection if you see…
- matched injection pattern in user-turn logs · jailbreak template cluster · adversarial turn sequence in LLM attempt logs driving model output against a server whose audit log is clean
- indirect-injection carrier artifact — uploaded doc · RAG chunk · MCP tool result containing override text whose source was the underlying data, not server-side rewrite
- guardrail bypass score anomaly, system-prompt exfil attempt, or red-team eval log showing the input stream — not the server pipeline — was the attack vector