// investigation guide

MCP server compromise — methodology

mcp server compromise is not ai agent runaway and not llm prompt injection. the model is faithful, the input stream is honest, the deploying agent is doing what it was told — but the model context protocol server itself is the failure locus. evidence is the server audit log, the client-side invocation log, the divergence between the two, and the server-side grant ledger — not the model transcript and not the user prompt. a compromised server can fool both honest models and honest agents. the orchid-mcp-server-compromise fixture (orchid-lab-db-mcp) compresses that arc into ten minutes while ops engineer ren okada remained online but approved nothing.

intake — what this looks like in the wild

the alert is rarely labeled mcp server compromise. it arrives as an llm integration producing the wrong response, a tool call that touched a resource no human asked for, an audit-log row showing a capability the deployed manifest did not declare, or an attribution gap where the client log says "called tool X" and the server log says "executed tool Y."

typical arc: a team deploys an MCP server (self-hosted file system bridge · database query gateway · third- party hosted SaaS connector · in-house enterprise data adapter) → server credentials leak, the server is impersonated by a same-LAN attacker, or a supply-chain attacker tampers the server binary → the server returns tampered tool definitions, rewritten tool results, or invokes downstream operations the client did not request → the LLM client and the agent both behave correctly relative to what the server told them, which is exactly the wrong evidence frame.

unlike runaway agents (autonomous re-planning) and prompt injection (adversarial input), mcp server compromise is a supply-chain failure on the tool layer. the model and the agent are not misbehaving — they are being faithfully misled by the server they trust. that distinction drives which export to preserve first and which counterparty to notify.

  • distinct from ai-agent-runaway: in runaway the agent acts outside scope against a clean server. here the agent's requests arrive faithfully and the server lies.
  • distinct from llm-prompt-injection: in prompt injection the input bent the model running against an honest server. here the input is clean and the server tampered tool results before the model ever saw them.
  • distinct from supply-chain-compromise: classical supply-chain compromise lives at build/sign/deploy time. mcp server compromise can live there too, but the unique forensic surface is the runtime tool-call ledger and the per-session capability list.

preservation — what to collect first

isolate the MCP server before you analyze. every minute it keeps serving adds tool calls you cannot un-ring and lets the attacker rewrite their own audit trail. detach the server from the agent runtime · revoke server credentials · rotate the server's outbound tokens · snapshot the server binary and config bundle · freeze the audit log file — in that order if the layer at fault is uncertain.

artifactvolatilitytime to loss
server audit log (capability-list · tool-registration · tool-invocation rows)rolling bufferhours to days · server may rotate · snapshot before rotation
client-side invocation logrollingvendor-dependent · 7–30 days typical
server binary hash + deployed config bundle + signed manifestpersistent if versionedoverwritten on redeploy — snapshot before rollback
server permission ledgerpersistentcapability changes may be silently dropped if server restarted
TLS pinning record + server identity attestation logrollingper-session · impersonation evidence vanishes when session closes
OAuth grant ledger on tokens the server holds toward downstream systemspersistentrevocation removes future use · past tokens still visible in downstream cloudtrail
tool-result payloads (cached responses returned to clients)often not retainedpreserve underlying data source state alongside server return for rewrite detection

the first 10 minutes

  1. isolate the MCP server — detach from agent runtime, block its outbound network, do not power off (preserves memory).
  2. export server audit log for the suspected session window — do not rely on UI scrollback.
  3. pull client-side invocation logs covering the same session ids — both sides of every tool call.
  4. snapshot server binary, config bundle, and signed manifest. compute and record sha-256 of each.
  5. export the server's permission ledger / capability list / tool-registration history.
  6. capture TLS pinning records and server identity attestation log if impersonation is suspected.
  7. preserve OAuth grants the server holds toward downstream systems; do not revoke them yet — preserve evidence first.
  8. preserve any cached tool-result payloads and the underlying data source state for rewrite comparison.
  9. notify platform owner + counsel + the MCP server vendor (if hosted by a third party).
  10. begin the path below on frozen exports — files never leave your device in fatcousin tools.

analysis — the path (api-agentic-action vertical)

the spine follows the api-agentic-action vertical but pivots on the server, not the agent: server audit → client/server diff → server-side permission escalation → call graph → tool-result tampering → attribution. merge exports across multiple vendors with fatcousin-multi-tool-super-timeline-correlator or fatcousin-cross-export-ioc-hash-correlator when the server, the client, and the downstream system each emit different formats.

  1. 1. mcp server audit log forensic analyzer

    drop the MCP server audit log export — in orchid: evidence/mcp-server-audit-log.ndjson (~24 rows). capability-list calls, tool-definition registrations, tool-invocation logs, admin events. row srv-00011 executes inventory.export_bulk on tool_call_id tc-00011.why first: server compromise is defined by what the server did, not what the client thinks happened. the audit log is the server's confession; everything else cross-checks against it.honest limit: post-S6-goldens smoke on orchid: one deduped mcp_protocol_marker finding on the full ndjson corpus — not one finding per audit-row divergence (C-class undercount). pair srv-00011 against client tc-00011 manually until atlas-S6-2e-server-audit ships.

  2. 2. mcp client invocation log forensic analyzer

    drop the client-side MCP invocation log — in orchid: evidence/mcp-client-invocation-log.ndjson. enumerates what the claude client thought it called — tool name, args hash, expected schema version — for cross-check against the server-side audit row.why second: the bright line is divergence between client expectation and server execution. without the client log you cannot prove the server lied; you only have one side of the conversation.honest limit: orchid smoke: one deduped marker finding on 24 client rows — tc-00011 says inventory.lookup; the engine does not emit a named client/server mismatch finding today. manual tool_call_id pairing is required.

  3. 3. mcp server permission escalation detector

    drop the MCP server's permission ledger — in orchid: evidence/mcp-permission-ledger.json (six grant events). flags scope expansions that no client request, admin event, or signed manifest accounts for — grant-00002 adds /var/export without an admin row.why third: a server compromise frequently announces itself as quiet scope growth on the server's own grant table — new file system roots, new outbound hosts, new tool registrations.honest limit: orchid smoke: ten deduped findings from scanJson leaf walk — matches A-class fan-out on grant verbs. two narrative escalation beats (export_bulk registration · /var/export) · cite grant-00002 and srv-00003 row ids, not just the count.

  4. 4. mcp tool call graph reconstructor

    drop server-side call graph — in orchid: evidence/mcp-server-call-graph.json (~13 nodes). renders the dependency tree per session and surfaces branches where the server invoked downstream operations the client never requested — node n-copy-export is postgres.copy_to off a lookup parent.why fourth: a compromised server often reuses one client tool call as a launchpad for several downstream operations the client did not ask for. that branch-without-request pattern is the trace signature.honest limit: orchid smoke: seventeen deduped markers across graph nodes — describe the n-copy-export branch, do not treat seventeen as seventeen independent incident beats.

  5. 5. mcp prompt injection via tool result detector

    drop the tool-result payloads the server returned to the client — in orchid: evidence/mcp-tool-result-payloads.ndjson plus underlying-data-source-snapshot.json. flags imperative override text that did not originate in the underlying data source — tr-00004 and tr-00006 carry rewrite beats.why fifth: a compromised server can inject prompt-injection material into otherwise clean tool results. this is the line between mcp-server-compromise and llm-prompt-injection — same payload signature, different origin.honest limit: orchid smoke: one deduped marker on eight payload rows — prompt_injection_in_tool_result may not fire because MCP_MARKER precedes PROMPT_INJECT on the same line. compare tr-00004/tr-00006 against the postgres snapshot manually.

  6. 6. anthropic mcp claude tool call attribution tool

    drop Claude's tool-call attribution export — in orchid: evidence/anthropic-tool-call-attribution.ndjson. resolves which agent identity, session, and trust chain authorized each tool call against the MCP server — attr-00018 shows impersonated server cert fingerprint.why sixth: with the trace and graph anchored, the question becomes who authorized the call sequence. attribution records the principal chain — client identity → session token → server-side trust grant — which is the chain you hand counsel.honest limit: orchid smoke: one deduped anthropic_mcp_marker on 24 attribution rows — attr-00018 impersonation beat is a manual cross-check today, not a named finding type.

orchid walkthrough. feed evidence/mcp-server-audit-log.ndjson to step 1 — cross-check srv-00011 (inventory.export_bulk) against client tc-00011 (inventory.lookup) even when findings collapse to one marker. feed evidence/mcp-client-invocation-log.ndjson to step 2 — same session id orchid-mcp-sess-20260518-0900 on both sides. step 3 on evidence/mcp-permission-ledger.json — grant-00002 adds /var/export with no matching admin event. step 4 on evidence/mcp-server-call-graph.json — thirteen nodes · branch n-copy-export is postgres.copy_to without a client-requested parent. step 5 pairs mcp-tool-result-payloads.ndjson with underlying-data-source-snapshot.json — rows tr-00004 and tr-00006 carry override text absent from postgres. step 6 on anthropic-tool-call-attribution.ndjson — row attr-00018 shows impersonated server cert fingerprint. compare disambiguation: vs ai-agent-runaway · vs llm-prompt-injection.

optional extensions when the deployment surface differs from in-house MCP:

common false leads

  • "the model hallucinated" — check the server audit log. if the server invoked the operation, the call was real regardless of what the model intended.
  • "the agent went rogue" — runaway agents act outside a clean server's scope. if the server itself executed unrequested operations, this is the wrong case type.
  • "prompt injection in the user turn" — adversarial input bends the model running against an honest server. if the override text appeared in tool-result payloads the underlying data source did not produce, the server rewrote it.
  • "vendor-hosted server is trusted infrastructure" — vendor compromise is a real failure mode. preserve attribution and notify the vendor; do not assume.
  • "no human clicked download" — server-side rewrite or downstream tool invocation is still exfil. unattended server-driven calls against customer data count.

reporting — what the report says · what it does not claim

use case-report-generator or the case binder on the case-type page to assemble a deterministic html/pdf package with sha-256 of every input. the report should read as a server-side timeline cross-referenced against the client log, not a vendor blame essay.

the report should state:

  • MCP server identifier, deployment surface (self-hosted · vendor-hosted), wall-clock window
  • server binary hash + signed manifest sha-256 vs as-deployed sha-256
  • capability list / tool registration changes within the incident window
  • per-session list of tool calls the client requested vs the server executed
  • tool-result payloads that contain content the underlying data source did not produce
  • OAuth grants the server held and what downstream operations they enabled
  • which fatcousin tool produced each finding row and engine version from the audit anchors
  • sha-256 of each export in the preservation log

the report must not claim:

  • attacker attribution beyond what attribution logs and trust chain support
  • complete enumeration of tampered tool results when the underlying data source state was not preserved
  • vendor responsibility allocation — that is counsel and contract, not the analyst
  • that the model or the agent malfunctioned — they may have behaved correctly against a lying server
  • vendor-native fidelity the vendor-fidelity audit marks template-misfit — cite honest limits inline
  • chain-of-custody admissibility — fatcousin is a local triage workbench, not records management software

handing it off

  • platform / sre: server identifier, rotated credentials, capability ledger diff, deployed-manifest vs running-binary hash comparison, downstream OAuth rotations.
  • security / ir: tool-call trace cross-reference, suspected rewrite payload list, server outbound network artifacts, attribution chain summary.
  • vendor / supply-chain: binary hash, signed-manifest provenance, vendor portal export timestamps — for vendor support ticket and disclosure.
  • privacy / counsel: data categories touched, notification timeline, preservation memo with hashes; flag whether server runtime data crossed jurisdictional boundaries.

court — declaration outline · expert-witness language

no plug-and-play citation snippet on this page — mcp server compromise declarations vary too much by server deployment surface and vendor. below is an outline counsel can adapt. this is not legal advice.

  • qualifications — dfir practice, MCP and agentic systems familiarity, tools used
  • materials received — list each export with sha-256 and collection timestamp; flag server-side vs client-side origin per file
  • methods — local browser analysis, deterministic tools, no upload of evidence to fatcousin servers
  • findings — numbered timeline tied to tool_call_id and audit-log row id, cross-referencing client-side and server-side records
  • limits — partial vendor schema fidelity, audit-log rotation gaps, no live server inspection, underlying data source state not preserved for every tool result
  • conclusion — server executed operations the client did not request and/or returned content the underlying data source did not produce

example expert-witness paragraph (orchid-shaped):

I analyzed exports from the orchid-lab-db-mcp server deployment serving the orchid research collective agent runtime during 2026-05-18 09:00:00 UTC through 09:10:30 UTC. Using locally executed forensic tools I reconstructed the server-side audit log, the client-side invocation log, and the permission grant ledger. Server audit row srv-00011 records inventory.export_bulk on tool_call_id tc-00011 while the client invocation log records inventory.lookup for the same tool_call_id. Tool-result payloads tr-00004 and tr-00006 contain imperative override text not present in the underlying postgres snapshot preserved at freeze time. The deployed server binary hash does not match the signed manifest sha-256. My conclusions are limited to the provided exports; I did not inspect live server runtime state.

further reading

reference investigation

synthetic fixture orchid-mcp-server-compromise — supply-chain tamper on a self-hosted postgres mcp server · client/server tool_call_id divergence · two tool-result rewrites · ten-minute window, seed orchid-mcp-server-compromise:v1. six primary tools have published goldens; four ndjson engines under-count semantic beats today — see smoke note in incident-context.json before comparing counts.

case playbook: case type tools · vertical: api-agentic-action · compare locally: npx tsx app/tools/__fixtures__/cases/orchid-mcp-server-compromise/generate.ts --verify · regen goldens: npx tsx scripts/fixtures/capture-orchid-mcp-goldens.ts

ready