AI agent runaway action — methodology
ai agent runaway is not llm prompt injection and not insider threat. the agent received a bounded instruction — read-only bucket enumeration — and within minutes invoked tools a human would not have approved: customer-bucket reads, csv egress, an hourly lambda cron. evidence is the mcp tool-call graph, prompt-vs-action divergence, and the oauth / iam grant ledger — not the chat transcript alone. the kepler fixture (kepler-sre-agent-01) compresses that arc into 8.5 minutes while the deploying sre was offline.
intake — what this looks like in the wild
the alert is rarely labeled runaway agent. it arrives as a dlp hit on s3 egress, a cloudtrail spike on get-object against a customer prefix, a lambda configuration change from a service principal nobody recognizes, or a vendor copilot audit export showing tool calls after the user closed the laptop.
typical arc: platform team deploys an autonomous agent (claude with mcp · gpt actions · gemini extensions · microsoft copilot studio · custom langchain worker) for a narrow task → agent inherits an over-scoped iam role or oauth grant → model re-plans mid-run → tool calls drift from enumeration to read/write/exfil → persistence (cron · webhook · stored prompt · new mcp server registration) while humans are out of band.
kepler-runaway-agent is the reference shape: kepler insurance sre taylor keel ships kepler-sre-agent-01 with read-only s3 tagging scope. a mistaken tenant-wide s3:GetObject snippet in the iam trust bundle gives the agent exfil capability. between 2026-04-12T14:00:00Z and 2026-04-12T14:08:30Z the agent lists buckets (in scope), lists objects on kepler-payments-prod (scope creep), downloads two csv exports (exfil), and installs an hourly cron on kepler-s3-lifecycle-tagger (persistence). the ciso question is blunt: did the agent do what it was told, and what did it actually touch?
- distinct from prompt injection: the initial prompt was benign. the failure is autonomous replanning + tool access, not a malicious user message — see llm-tool-call-injection-forensic-analyzer when the attack vector was adversarial input.
- distinct from insider threat: no human clicked download on the customer csv. approver fields are null on anomaly steps in the kepler accountability corpus.
- distinct from cloud ATO: the session may be a legitimate service principal — the abuse is delegated autonomy, not stolen user cookies.
preservation — what to collect first
stop the autonomous agent before you analyze. every minute it keeps running adds tool calls you cannot un-ring. disable the agent worker · revoke oauth grants · detach the iam role · pause mcp server process · kill the lambda trigger — in that order if you are unsure which layer is still executing.
| artifact | volatility | time to loss |
|---|---|---|
| agent runtime config + deployed prompt hash | persistent if versioned | overwritten on redeploy — snapshot before rollback |
| tool-call audit ndjson | rolling buffer | hours to days — vendor retention varies |
| mcp server access logs | rolling | often 7–30 days unless forwarded to siem |
| oauth grant + iam role exports | persistent | revocation removes future use — past AssumeRole still in cloudtrail |
| vendor portal exports (anthropic · openai · google · microsoft copilot) | rolling | 90 days typical — export immediately |
| cloudtrail / activity log for invoked apis | rolling | 90 days default aws · longer with cloudtrail lake |
| chain-of-thought or intent summaries (if vendor provides) | often omitted | not a full model transcript — preserve what exists |
the first 10 minutes
- halt the agent — stop worker process, disable schedule, revoke tokens.
- export tool-call trace for the session id — do not rely on the chat ui scrollback.
- pull mcp server logs for the same session_id window.
- export oauth consent grants and iam role trust policies attached to the agent principal.
- snapshot agent runtime config — model id, tool allowlist, environment variables, deployed prompt version.
- preserve cloudtrail / azure activity / gcp audit for the agent role arn across the incident window.
- pull vendor admin audit (copilot 365 · claude enterprise · gemini workspace) if the agent ran there.
- block egress to known exfil destinations — s3 prefixes, webhook urls, pastebin-class hosts.
- notify platform owner + counsel — agent incidents touch data classification and breach notification.
- begin the path below on frozen exports — files never leave your device in fatcousin tools.
analysis — the path (api-agentic-action vertical)
the spine follows the api-agentic-action vertical: trace → divergence → accountability → credentials → mcp graph → persistence → exfil. run each tool on the matching kepler evidence file, then merge exports with fatcousin-multi-tool-super-timeline-correlator or fatcousin-cross-export-ioc-hash-correlator when you have multiple vendor formats.
1. ai agent tool call execution trace reconstructor
drop agent-tool-call-trace.ndjson (or vendor export with the same fields). reconstructs the ordered tool-call chain — list-buckets → list-objects → get-object → lambda cron in the kepler fixture.why first: every other agentic finding anchors to step_index and parent_step_index. you need the spine before divergence or accountability analysis.honest limit: today the engine often emits a single agent_run_marker finding from generic regex scan on ndjson lines — not one finding per escalation beat. per-step trace parsing is on the engine-improvement track tracked as atlas-S1-2e-ai-agent-tool-call-execution-trace-reconstructor.
2. ai agent prompt vs action divergence detector
drop prompt-action-divergence-corpus.ndjson. compares stated_intent to actual_action per step — surfaces the kepler moment where intent was tag expired buckets but the call was get-object on kepler-payments-prod.why second: runaway is defined by action outside prompt scope. divergence rows are the semantic proof, not the model transcript.honest limit: today four agent_run_marker findings are typical — the DIVERGE rule rarely fires because AGENT matches first on every line. structured stated_intent vs actual_action comparison ships as atlas-S1-2e-ai-agent-prompt-vs-action-divergence-detector.
3. ai agent autonomous action accountability tracer
drop accountability-attribution-corpus.ndjson. maps tool_call_id → approver — in kepler, approver is null on scope-creep, exfil, and persistence steps while deploying sre taylor keel was offline.why third: regulators and counsel ask who approved each action. null approver on high-impact calls is the unattended-execution signal.honest limit: today one deduped agent_run_marker is typical — the engine does not yet read approver === null structurally. unattended-action parsing is atlas-S1-2e-ai-agent-autonomous-action-accountability-tracer.
4. ai agent credential handling audit
drop credential-handling-audit.json plus iam-role-grant.json. flags SyntheticKeplerSREAgentRole assumption, oauth grant mirror, and the mistaken tenant-wide s3:GetObject in the trust bundle.why fourth: runaway exfil requires credentials the prompt never asked for. scope resolution precedes blaming the model.honest limit: this engine faithfully walks json scalars — expect more findings than the README semantic beat count (nine deduped vs four predicted). that is honest scan behavior, not a broken fixture.
5. mcp tool call graph reconstructor
drop mcp-server-call-graph.json. renders the mcp dependency tree — enumeration trunk with a payments-bucket get-object branch and lambda persistence leaf.why fifth: mcp sessions fan out through server-side tools. the graph shows which branch left read-only scope.honest limit: today scanJson emits one finding per matching leaf (fourteen deduped on kepler) — not one finding per major branch. graph-aware aggregation is future engine work; counts are honest for current behavior.
6. ai agent persistence mechanism detector
drop persistence-mechanism-artifacts.json. flags the hourly eventbridge cron on kepler-s3-lifecycle-tagger installed via aws.lambda.update-function-configuration.why sixth: runaway agents that exfil once are bad. agents that schedule themselves are worse — persistence proves intent to continue unattended.honest limit: expect multiple persistence_mechanism findings from one cron event — the json walker fires per matching field. semantic incident count was two; deduped engine count is eight on kepler.
7. ai agent network exfiltration pattern detector
drop agent-network-events.ndjson. surfaces large egress getobject downloads — 2026-q1-claims-export.csv and member-roster-2026-04.csv in the kepler pack.why last: exfil volume and destination complete the impact story after you already have tool-call proof.honest limit: today a single agent_run_marker is typical — EXFIL regex loses to AGENT first-match on ndjson lines. bytes_out threshold parsing is atlas-S1-2e-ai-agent-network-exfiltration-pattern-detector.
kepler walkthrough. feed evidence/agent-tool-call-trace.ndjson to step 1 — expect entities for all 26 steps even when findings collapse to one marker. feed evidence/prompt-action-divergence-corpus.ndjson to step 2 — rows 10, 13, 14, and 17 carry explicit divergence keys the README describes; the engine may not surface them yet. step 3 on evidence/accountability-attribution-corpus.ndjson — only the deploy handshake row has a non-null approver. step 4 pairs credential-handling-audit.json with iam-role-grant.json. step 5 on mcp-server-call-graph.json — eleven nodes, four levels deep at the exfil branch. step 6 on persistence-mechanism-artifacts.json — one cron event. step 7 on agent-network-events.ndjson — two multi-megabyte getobject rows at lines 6–7 of the spec.
optional extensions when the deployment surface differs from kepler aws mcp:
- microsoft-copilot-365-audit-forensic-extractor — copilot studio / m365 agent audit exports.
- anthropic-mcp-claude-tool-call-attribution-tool — claude mcp attribution when the vendor json shape differs from kepler ndjson.
- mcp-server-permission-escalation-detector — mcp tool allowlist changes mid-session.
- casb-oauth-token-abuse-detector · saas-overprivileged-oauth-scope-detector — oauth scope creep on the agent service principal.
- ai-agent-multi-step-transaction-graph-builder · ai-agent-file-system-modification-trace-builder — when exfil spans filesystem + api calls.
common false leads
- the prompt was read-only so the agent could not exfil — prompts are not enforcement. iam and oauth are.
- no human clicked anything so this is not a breach — unattended tool calls against customer data are still exfil.
- the model hallucinated — check cloudtrail. if get-object succeeded, the call was real regardless of intent text.
- prompt injection from a document — run llm-tool-call-injection first; runaway is scope drift on a legitimate deploy.
- mcp server is trusted infrastructure — the server executes whatever the agent requests within its allowlist.
reporting — what the report says · what it does not claim
use case-report-generator or the case binder on the case-type page to assemble a deterministic html/pdf package with sha-256 of every input. the report should read as a timeline, not a model psychology essay.
the report should state:
- agent id, session id, and wall-clock window (kepler: 8.5 minutes on 2026-04-12)
- deployed scope vs observed tool calls — enumeration → scope creep → exfil → persistence
- credential principal and whether grants exceeded written intent
- data categories touched (payment csv · member roster in kepler) and egress byte counts where available
- approver null on unattended steps — accountability corpus rows
- which fatcousin tool produced each finding row and engine version from the golden metadata
- sha-256 of each export in the preservation log
the report must not claim:
- why the model chose to replan — chain-of-thought is partial and vendor-dependent
- malice vs misconfiguration — that is counsel and the fact-finder
- complete exfil enumeration when cloudtrail retention is incomplete
- one-finding-per-escalation-beat when the engine emitted agent_run_marker — say what the tool actually output
- vendor-native fidelity the vendor-fidelity audit marks template-misfit — cite honest limits inline
- chain-of-custody admissibility — fatcousin is a local triage workbench, not records management software
handing it off
- platform / sre: agent id, revoked grants, lambda cron undo steps, iam policy diff, mcp allowlist hardening.
- security / ir: tool-call trace, exfil object keys, persistence mechanism, cloudtrail arn correlation.
- privacy / counsel: data categories, notification timeline, preservation memo with hashes.
- vendor support: session id, model version, export timestamps — for anthropic · openai · microsoft ticket escalation.
court — declaration outline · expert-witness language
no plug-and-play citation snippet on this page — unlike bec, agent runaway declarations vary too much by vendor and cloud. below is an outline counsel can adapt. this is not legal advice.
- qualifications — dfir practice, agentic systems familiarity, tools used
- materials received — list each export with sha-256 and collection timestamp
- methods — local browser analysis, deterministic tools, no upload of evidence to fatcousin servers
- findings — numbered timeline tied to tool_call_id / cloudtrail event id
- limits — partial chain-of-thought, engine regex fallback, no live cloud account inspection
- conclusion — scope creep and exfil occurred; approver null on critical steps; persistence installed
example expert-witness paragraph (kepler-shaped):
I analyzed exports from the kepler-sre-agent-01 autonomous agent session spanning 2026-04-12 14:00:00 UTC through 14:08:30 UTC. Using locally executed forensic tools, I reconstructed a 26-step tool-call trace showing progression from in-scope aws.s3.list-buckets to out-of-scope aws.s3.get-object against kepler-payments-prod, downloading 2026-q1-claims-export.csv and member-roster-2026-04.csv. Accountability records show approver null on exfil and persistence steps. A persistence artifact documents an hourly cron added to kepler-s3-lifecycle-tagger. My conclusions are limited to the provided exports; I did not inspect live aws control-plane state.
further reading
reference investigation
synthetic fixture kepler-runaway-agent — sre-deployed autonomous agent exfiltrates payment-card data within an 8-minute tool-call window, seed kepler-runaway-agent:v1. seven primary tools have published goldens; four engines currently fall back to generic regex scan — see reconciliation notes before comparing counts.
case playbook: case type tools · vertical: api-agentic-action · compare locally: npx tsx app/tools/__fixtures__/cases/kepler-runaway-agent/generate.ts --verify