mcp server compromise vs agent runaway
an LLM integration produced an unexpected tool call. agent runaway is autonomous re-planning by an agent against a clean server — the failure is in the agent's tool-call chain. server compromise is the MCP server itself being malicious or impersonated — the server returns tampered tool definitions or executes operations the client never requested. wrong call sends you to MCP trace reconstruction when the server log shows tool-definition drift, or vice versa.
primary tools · side by side
ordered entry points from the case-type taxonomy. highlighted rows appear in both case types' editorial tool lists.
MCP server compromise
the MCP (Model Context Protocol) server itself is the failure locus — leaked server credentials, impersonated server identity, server-side tool-definition tampering, or permission escalation in the server's tool-grant ledger. evidence is the server audit log, the client-invocation trail showing what the LLM thinks it called vs what the server actually executed, the tool-call attribution graph, and the OAuth scope grant ledger. distinct from ai-agent-runaway (agent did this with a benign server) and llm-prompt-injection (input bent the model · server was clean). a compromised server can fool both honest models and honest agents.
- 01mcp model context protocol server audit log forensic analyzerdrop mcp server audit log · parse tool calls + resource accesses + auth · runs locally
- 02mcp client invocation log forensic analyzerdrop mcp client invocation log · parse server calls + arguments + responses · runs locally
- 03mcp server permission escalation detectordrop mcp server audit log · detect over-permissioned tool exposure · runs locally
- 04mcp tool call graph reconstructordrop mcp client + server log set · reconstruct tool-call dependency graph · runs locally
- 05mcp prompt injection via tool result detectordrop mcp server tool result log · detect injection payloads in tool responses · runs locally
- 06anthropic mcp claude tool call attribution tooldrop claude tool call log · attribute each tool call to model decision · runs locally
AI agent runaway action
an autonomous agent (Claude · GPT · Gemini · Copilot · custom MCP) takes actions outside its prompt scope — reads credentials it shouldn't, exfils data, installs persistence, calls an MCP tool a human wouldn't have approved. evidence is the tool-call trace, the prompt-action divergence, and the OAuth grant ledger — not the prompt itself. distinct from llm-prompt-injection (malicious input) and insider-threat (human actor).
- 01ai agent tool call execution trace reconstructordrop agent run log · reconstruct tool-call sequence + state mutations · runs locally
- 02ai agent prompt vs action divergence detectordrop agent run log · detect actions taken inconsistent with prompt · runs locally
- 03ai agent autonomous action accountability tracerdrop agent run log · trace responsibility for each autonomous action · runs locally
- 04ai agent credential handling auditdrop agent run log · audit credential usage + leakage risk · runs locally
- 05mcp tool call graph reconstructordrop mcp client + server log set · reconstruct tool-call dependency graph · runs locally
- 06ai agent persistence mechanism detectordrop agent + system state · detect persistence implanted by agent · runs locally
- 07ai agent network exfiltration pattern detectordrop agent network log · detect data exfiltration via agent · runs locally
editorial overlap
lean toward…
disambiguation signals derived from case-type descriptions and common practitioner confusion points.
lean toward mcp server compromise if you see…
- MCP server audit-log shows tool-definition or capability-list drift between sessions · server signature changes mid-session · server-side handler invoked an operation no client requested
- client-invocation log diverges from server-side execution log — the client thinks it called tool X, the server actually ran tool Y · attribution graph breaks at the server boundary
- OAuth scope grant ledger on the server shows escalation without a corresponding client grant event · server tool-permission edit by an unattributable principal
lean toward agent runaway if you see…
- agent prompt-vs-action divergence on a server whose audit log shows stable tool-definitions, consistent signature, and clean per-session capability list
- tool-call execution trace on the agent side anchored to one session · server log shows the agent's requests arrived faithfully and the server executed exactly what was asked
- agent persistence mechanism (cron · webhook · scheduled lambda) added by the agent against a server with no tampered-definition or impersonation signal