deepfake investigation (video / audio / image) - methodology
deepfake investigation is not a vibe check on whether something looks weird. it is a structured pass across video motion, face integrity, generative stills, microphone-level audio, and sensor noise. the regional news anchor impersonation pattern stacks real station branding with a swapped talking head, a GAN-smoothed portrait for shares, a spliced voice track, and sometimes a copy-moved background patch to hide editing. evidence is fragile the moment a platform re-encodes or a helpful editor normalizes levels. you preserve first, measure second, and separate each claim so counsel and partners see which layer failed.
what evidence exists and how fast it dies
volatility is not only about deletion. recompression, denoise, and AI upscaling are equally destructive. treat every handoff as a chance to bake in new artifacts. the table below lines up what you usually have in an anchor-impersonation scenario, how stable it is, and when it typically becomes useless for sub-pixel forensics. adjust rows for your jurisdiction and platform mix.
| artifact | volatility | time to loss |
|---|---|---|
| original video container from leaker device | persistent if isolated | minutes once auto-upload services re-encode |
| raw or PCM audio export before podcast mastering | semi-persistent | hours if passed through voice cleanup or chain normalizers |
| camera-original stills for PRNU reference | persistent | days if someone crops, denoises, or runs face retouch |
| social platform derivative (720p, variable bitrate) | persistent but degraded | immediate loss of fine PRNU and many swap edges |
| station master or satellite feed archive | persistent at broadcaster | weeks if storage rotation or legal hold gap |
| metadata (container timestamps, handset EXIF) | mixed | stripped on first re-export depending on exporter |
| witness memory of acquisition path | volatile | hours before narratives solidify incorrectly |
the first 10 minutes
this list is preservation and scope. no hot takes. no chain-forwarding inside the newsroom Slack where clients recompress automatically. clock starts when counsel or the trust team hands you binaries.
- isolate the newest copy that is closest to acquisition. screenshot the DM, email headers, or internal CMS download panel with UTC time. you need to prove what file version you touched.
- hash every incoming file twice (SHA-256 minimum) before playback. write hashes into your notes and into the package you send upstream. hashing is cheap insurance against later swaps.
- copy evidence to encrypted offline storage immediately. spinning rust or agency-approved SSD, not cloud sync folders that preprocess video.
- if the clip is nested inside an archive chat export, carve the media out without transcoding the video track.
- pull a known-good reference clip from the station official feed closest in date to the impersonation publish window. grab both wide and CU shots if licensing allows PRNU comparisons.
- document device model and OS if someone recorded a rebuttal on a phone. you will explain later why certain color pipelines differ from studio cameras.
- freeze any internal editor project files. timeline XML, Audition sessions, or Resolve databases show import paths and warp stabilizer passes that flatten forensic cues.
- label audio channels explicitly. narration vs room tone vs duck music matters for splice detectors and TTS analyzers downstream.
- log every transform already applied ("denoised in Descript", "touched in CapCut"). those go in the caveat block of every finding.
- only after hashing and copying, launch the methodology path locally. FatCousin tools never transmit your files. that does not excuse unsafe sharing on corp laptops without disk encryption policy checked.
the path
ordered for how fast motion and generative artifacts degrade. each step narrows hypotheses. rerun later with a cleaner original if procurement wins a better file.
1. video deepfake analyzer
drop the contested clip as the original container or a lossless remux. reads frame timing, motion coherence, and facial-region stability across a short window. the synthetic variant in this class of case often arrives as a tight strip of frames or a re-encoded social clip. you are looking for temporal seams that a still thumbnail will never show.why first: impersonation videos live or die on motion. if the face is composited or swapped, frame-level metrics and lip-track inconsistency surface before you burn time on a single JPEG headline image.
2. face swap artifact detector
crop to the talking head if you must, but prefer full frames so boundary cues stay in context. highlights mask edges, color transfer seams, and resolution mismatch between the inner face oval and the rest of the scene. political and news-forgery workflows love a trusted anchor still pasted onto an unrelated body or background plate.why second: after motion stress, you need a dedicated pass on swap-specific cues. GAN heads and broadcast B-roll can both look sharp. swap boundaries are often the cleanest explanation when the audio already looks edited.
3. gan fingerprint detector
feed stills that claim to be camera originals, especially hero headshots pulled from the viral post. frequency-domain grid and upsampling fingerprints appear when a face was synthesized or heavily GAN-refined even if social compression tries to smear them. treat hits as probabilistic. combine with PRNU and chain-of-custody, not as a single smoking gun.why third: synthetic headshots propagate fast in regional news scams. attackers pair a plausible studio portrait with swapped video. pinning a GAN-weight signature early stops the investigator from arguing about lighting when the Fourier floor already looks wrong.
4. ai synthetic voice generation artifact analyzer
load the separated speech WAV or original mux audio without normalizing aggressively. analyzes voicing quirks, unnatural stability, and TTS-era micro-artifacts under pitch tracking. clones are good now. brittle claims are worse than honest uncertainty. record what you see before denoise plugins flatten the waveform.why fourth: once video and face passes are queued, prioritize the apology clip or bogus statement audio. impersonation scams increasingly lead with plausible timbre while the fine structure still trips statistical detectors.
5. audio splice detector
same audio file, complementary lens. hunts cross-fade seams, bitrate stair-steps, duplicated noise floors, and room-impulse inconsistencies between adjacent phrases. splice logic matters when attackers stitch a genuine cold open from the anchor to synthetic scandal lines generated offline.why fifth: synthetic-voice detectors can disagree on tone while cut-and-paste edits leave boundary fingerprints. run both. the Arias-style fixture encodes that split. your report should separate model-voice concerns from edit-seam concerns.
6. ela detector
works on distributed JPEGs and recompressed PNG exports. error level analysis highlights regions that survived a different compression generation than their neighbors. useful for pasted logos, airbrushed skin blocks, and rescaling before upload. remember ELA is fragile on platform transcodes. pair with originals when they exist.why sixth: after dedicated face and audio tooling, a fast compression map finds lazy compositing in key stills. news desk graphics and crisis screenshots often compress twice. ELA tells you where the edit stack diverged.
7. prnu fingerprinter
compare suspect stills against reference frames you trust. PRNU correlation ties an image to a specific sensor when both sides are minimally processed. official station promos, satellite liveshots, and wire photos each carry different noise residues. weak lighting and heavy denoise reduce confidence. document every crop and scale step you tried.why seventh: deepfakes lean on authentic elements. a PRNU match on the background plate while the face fails swap checks is a strong story. no match on the hero portrait while the B-roll matches can point to a synthetic insert taken from a different pipeline.
8. copy move forgery detector
full-frame newsroom shots are easy to doctor with duplicated crowd tiles, cloned desk objects, or duplicated anchor garments to hide tracking errors. copy-move search finds block-level repetition that human reviewers glaze past. run it late so you are not chasing duplicates from legitimate tiled video compression.why last: once identity and generation questions are logged, scene-level integrity closes the loop. copying a clean patch of the set to cover a bad mask is older than diffusion models. it still appears beside modern face tools.
common false leads
- bloodshot eyes equals fake: heavy key light, allergies, long hits, or compression mosquito noise look uncanny without any generative pipeline. corroborate with swap metrics and splice audio, not aesthetics alone.
- minor lip jitter proves deepfake: Skype-grade packet loss, CFR-to-VFR mishandling, and warp stabilizers all chew mouth shapes. reconcile against container timing before accusing.
- JPEG blocks mean pasted face: social recompression can tile the whole raster. isolate whether anomalies cluster on the oval or uniformly across sweaters and graphics.
- PRNU mismatch ends the inquiry: heavy platform scaling kills PRNU coherence even on real shots. widen the corpus of references or downgrade the claim from match to inconclusive transparently.
- synthetic voice positive equals fake statement: modern clones track prosody eerily well. treat analyzer scores as corroborating context next to splice seams, witness timeline, and known-good anchors.
- copy-move hit always means malign intent: news GFX templates legitimately reuse texture fills. contextualize duplication location against editorial norms before alleging forgery.
what we can tell you, what we can't
FatCousin runs deterministic heuristics in your browser without shipping media to our servers. that model changes what honesty sounds like.
we can tell you:
- frame-wise video cues that resemble deepfake authoring patterns and temporal incoherence
- localized face boundary and color seam signals consistent with compositing workflows
- frequency fingerprints often associated with GAN upsamplers on still portraits
- statistical deviations suggestive of TTS or neural voice conversion when audio is minimally mangled
- evidence-level hints of cross-bitrate seams and duplicated noise floors indicative of editing
- PRNU correlations or lack thereof versus reference stills drawn from the same device class expectation
- copy-move duplication maps that flag cloned scene patches deserving human review
we can't tell you:
- intent, criminal liability, or which individual pressed export. prosecutors and courts decide that.
- model vendor attribution (Stable Diffusion vs proprietary API) absent embedded provenance blobs.
- certainty on clips that endured unknown platform stacks beyond what you preserved.
- whether the victim voluntarily participated versus coercion. escalate to counselors and sworn statements.
- network provenance across accounts without access to platform logs we never touch.
handing it off
package reproducible hashes, untouched originals, annotated screenshots per tool stage, plus a blunt list of caveats tied to destructive transforms users already admitted. impersonation crises cross legal and safety lines fast. defer final language to specialists.
- broadcast legal / news standards: tool outputs (CSV summaries, flagged frame indices), curator notes on reference clip licensing, drafts that separate facts from probabilistic wording.
- law enforcement cyber units: hash manifest, timelines of first observation, IOCs extracted from downloader URLs or metadata, uncompressed audio where available.
- outside counsel handling defamation: chain-of-custody affidavit outline, sworn statements on who touched the clips, glossary of forensic terms spelled for non-specialists.
- insurer cyber or media-liability desks: incident summary emphasizing confirmed manipulations vs open questions so reserves map to reality.
- safety-minded station security: doxing risk for the impersonated anchor, credential reset recommendations, escalation if synthetic audio targets internal phone trees.
- trusted partner NGOs or OSINT desks: only after legal clears sharing. never dump raw identity details into public threads.
further reading
- NSA / FBI / CISA cybersecurity information sheet on organizational deepfake threats
- NIST SP 800-86, integrating forensic techniques into incident response
- NIST CFTT background on validated digital forensics workflows
- ENISA threat landscape reports on influence and synthetic content risk context
- MITRE ATT&CK (use for mapping influence or compromise adjacent TTPs when nation-state angles surface)
reference investigation
synthetic fixture arias-deepfake-investigation models regional anchor impersonation with PRNU-matched stills, a face-swap composite, a GAN-grid headshot, spliced voice audio, copy-move scene forgery, and a six-frame video metrics strip echoing quick-turn social posts. fixture seed arias-deepfake-investigation:v1. after you reproduce outputs locally, align against repository goldens using npm run check:flagship.
fixture download: evidence zip · proof page: /forensics/proof/arias-deepfake-investigation · case playbook: case type tools