document forgery / disputed authenticity — methodology
document forgery is not "does it look wrong on screen." it is whether the bytes you received match the revision history the sender claims. signed PDFs can be incrementally edited after signature. DOCX branches diverge on author, revision count, and template lineage while exporting to identical-looking PDFs. legacy Word binaries still carry ghost text in free sectors. preserve originals before any re-save, print-to-PDF, or accept-all changes — those operations destroy the evidence you need.
what evidence exists and how fast it dies
| artifact | volatility | time to loss |
|---|---|---|
| original PDF bytes (incremental chain intact) | persistent if preserved | destroyed on re-save, print-to-PDF, or Acrobat "save as optimized" |
| post-signature xref append + /Prev trailer | embedded in file | flattened away on single-revision export |
| Word tracked-change XML (w:ins / w:del / w:moveFrom) | persistent in draft branch | stripped on accept-all or clean PDF export |
| DOCX core.xml + app.xml metadata | persistent until overwrite | author/revision fields reset on template re-save |
| digital signature byte-range coverage | embedded in signed revision | invalidated by any append — but badge may still display |
| legacy OLE .doc free-sector ghost text | persistent until re-save in Word | lost when file is opened and saved in modern Word |
| sharepoint / DMS version history | rolling at vendor | 90 days to years depending on tenant policy — export at triage |
| email attachment .eml with delivery headers | persistent if saved | forwarding destroys headers; re-download may serve a newer version |
the first 10 minutes
- stop opening the disputed PDF in desktop readers until you have a hash — OpenAction JavaScript can fire on open.
- copy every version you have — email attachment, share link download, USB handoff — before any edit or re-save.
- sha-256 hash each file; record filename, source, and UTC collection time.
- collect all sibling branches: draft DOCX, clean export, signed PDF, any legacy .doc or scanned TIFF.
- export document-management version history if the file lived on SharePoint, Google Drive, or iManage.
- save the delivery email as .eml — who sent which attachment when matters for chain of custody.
- do not accept "print to PDF" or "save as flattened" substitutes for the original bytes.
- photograph the signature panel if counsel wants it — but the file hash is the evidence, not the screenshot.
- freeze further edits on the disputed document set until preservation is complete.
- begin the path below.
the path
1. pdf object explorer
drop the disputed PDF first. maps object tree, /Sig widgets, /JS OpenAction, embedded /Filespec, and current Info metadata without opening the file in a reader that executes scripts.why first: a signed contract PDF can carry live JavaScript and embedded executables. you need the structural map before anything triggers on open.
2. pdf forensics
full PDF scan — xref section count, incremental-update anomalies, javascript with eval(), embedded file extensions, Info vs visible content mismatches.why second: confirms whether you are looking at a stitched incremental PDF or a single-revision export, and surfaces high-risk objects the object explorer flagged.
3. pdf incremental update analyzer
parses every xref trailer in the chain. lists new and modified objects per revision — post-signature annotations, updated Info dictionaries, /Prev pointers.why third: post-signature edits in PDF are almost always incremental appends. this is the smoking gun when the signature badge still looks valid.
4. pdf author revision metadata analyzer
snapshots Info and XMP across xref sections. compares Author, Producer, ModDate, and document ID pairs revision to revision.why fourth: the thorne fixture shows Author flipping Mina Patel → Ethan Kline and Producer Adobe PDF Library → PDFium after the signature — metadata genealogy inside the PDF itself.
5. pdf digital signature chain analyzer
byte-range coverage, signed vs unsigned appended bytes, signer dictionary fields. structural analysis only — not full RSA/OCSP validation.why fifth: 545 bytes appended after signed coverage means the file changed after Jordan Blake signed. the badge can lie; byte range cannot.
6. document version ghost extractor
carve recoverable text from legacy OLE .doc free FAT sectors and stream tail overrun. surfaces deleted clauses and contact strings left in unallocated space.why sixth: pre-2007 Word binaries retain ghost text in sectors the editor no longer references. redlines survive after visible delete.
7. document metadata genealogy tracer
drop every related branch — signed PDF, tracked draft DOCX, clean export DOCX, legacy .doc. links template clusters, author edges, creator_transfer flags, near_zero_edit_time.why seventh: forgery disputes are multi-file. the PDF Author can disagree with the draft creator while both share ThorneContractTemplate.dotx lineage.
8. tracked changes forensic reconstructor
rebuilds w:ins, w:del, w:moveFrom/w:moveTo with author, timestamp, and deleted-text inventory from the Word draft.why last: the clean PDF export hides wire-transfer deletions and arbitration venue moves that only exist in the tracked draft branch.
common false leads
- the signature badge is green so the document is unchanged — incremental appends after signing do not remove the badge. check byte-range coverage and xref section count.
- Author metadata matches the expected signer — Info dictionaries are editable per revision. the thorne PDF shows Mina Patel in revision one and Ethan Kline after the post-signature update.
- the clean PDF export is the source of truth — it is a lossy branch. tracked wire-transfer deletions and arbitration venue moves live in the draft DOCX, not the export.
- low revision number means no edits — near_zero_edit_time on DOCX app metadata flags exports with implausible edit duration vs content change.
- print-to-PDF looks identical so analysis is unnecessary — flattening destroys the incremental xref chain and may strip signature byte-range evidence.
- the email was DKIM-valid so the attachment is authentic — transport integrity does not prove document integrity. a legit thread can carry an incrementally edited PDF.
what we can tell you, what we can't
we can tell you:
- whether a PDF has incremental updates after a /Sig dictionary — xref count, /Prev chain, appended bytes
- metadata genealogy across PDF revisions and DOCX branches — author, producer, template, revision drift
- tracked-change reconstruction — who inserted, deleted, or moved text and when
- recoverable ghost text and stream residue in legacy OLE Word binaries
- structural signature coverage — signed byte range vs unsigned tail (not full PKI validation)
- high-risk PDF objects — javascript, eval(), embedded executables, suspicious OpenAction
we can't tell you:
- whether a PKCS#7 signature cryptographically verifies — that requires certificate chain, OCSP/CRL, and a trusted verifier
- contract enforceability, fraud intent, or who physically sat at the keyboard
- content of edits that were never tracked and never hit unallocated sectors
- versions that only existed on a machine you do not have an image of
- legal admissibility rulings — counsel and the court decide that
handing it off
- outside counsel: sha-256 manifest, incremental-update report, signature byte-range summary, tracked-change reconstruction, metadata genealogy chart across all branches.
- opposing counsel / discovery: preserved originals only — not re-saved copies. include the draft branch if tracked changes are in dispute.
- digital forensics vendor: full workstation or mailbox imaging if the edit source is unknown and local artifacts may exist outside the delivered files.
- PKI expert / court-appointed examiner: CMS signature verification, certificate trust chain, timestamp authority validation — beyond structural analysis.
- insurer / counterparty: timeline of which version was presented for signature vs which version was paid against, with file hashes.
further reading
reference investigation
synthetic fixture thorne-document-forgery — Thorne Services contract authenticity dispute: signed PDF with post-signature incremental edit and 545 unsigned appended bytes, DOCX metadata genealogy mismatch across draft and export branches, tracked wire-transfer deletion, legacy OLE ghost text in a free FAT sector. seed thorne-document-forgery:v1. compare output via npm run check:flagship.
proof page: /forensics/proof/thorne-document-forgery · fixture download: evidence zip · case playbook: case type tools