Document Analysis

FDA Guidance Diff

Semantic document comparison that classifies regulatory changes. Validated against law firm analyses: 90-100% accuracy on modern FDA guidance documents.

122 changes detected
98 high impact
3 document pairs
Select Comparison
Total Changes
45
High Impact
43
Change Breakdown
NEW (33)
EXPANDED (3)
RESTRUCTURED (3)
REMOVED (6)
Filter
What do these change types mean?
NEW — Content only in newer document
REMOVED — Content only in older document
STRICTER — Requirements more stringent
MORE_LENIENT — Requirements less stringent
EXPANDED — Same topic, more detail added
CLARIFICATION — Same meaning, clearer language
RESTRUCTURED — Content moved/reorganized
EDITORIAL — Cosmetic changes only
EXPANDED HIGH IMPACT

II. Scope

The scope of applicable devices and submission types has been significantly broadened to include more device characteristics, additional submission types, and devices that do not require a premarket submission.

Before (p. 4)

2. Scope This guidance provides recommendations to consider and information to include in FDA medical device premarket submissions for effective cybersecurity management. Effective cybersecurity management is intended to reduce the risk to patient

After (pp. 6-7)

II. Scope This guidance is applicable to devices with cybersecurity considerations, including but not limited to devices that include a device software function1 or that contain software (including firmware) or programmable logic. The guidance is
EXPANDED HIGH IMPACT

1. Implementation of Security Controls

The new version significantly expands on the requirement for manufacturers to integrate cybersecurity into the design process, explicitly referencing specific CFR sections and detailing the need for design inputs, outputs, and acceptance criteria for security features.

Before (pp. 5-6)

4. General Principles Manufacturers should develop a set of cybersecurity controls to assure medical device cybersecurity and maintain medical device functionality and safety. FDA recognizes that medical device security is a shared responsibili

After (pp. 25-26)

1. Implementation of Security Controls FDA considers the way in which a device addresses cybersecurity risks and the way in which the device responds when exposed to cybersecurity threats as functions of the device design. Effective cybersecurity
EXPANDED HIGH IMPACT

B. Designing for Security

The new version expands on the cybersecurity functions by defining specific security objectives (Authenticity, Authorization, Availability, Confidentiality, and Secure and timely updatability and patchability) that manufacturers must address in their device design, moving beyond the general 'Identify, Protect, Detect, Respond, and Recover' framework.

Before (pp. 6-8)

5. Cybersecurity Functions The Agency recommends that medical device manufacturers consider the following cybersecurity framework core functions to guide their cybersecurity activities: Identify, Protect, Detect, Respond, and Recover.5 Identify

After (p. 11)

B. Designing for Security When reviewing premarket submissions, FDA intends to assess device cybersecurity based on a number of factors, including, but not limited to, the device’s ability to provide and implement the security objectives below th

How It Works

The pipeline extracts text from FDA guidance PDFs, chunks them by section hierarchy, aligns chunks across document versions using BM25 retrieval, then classifies each change with an LLM. The result is a structured diff that tells you not just what changed, but how: stricter requirements, new content, clarifications, or removals.

Key Technical Decisions

Component Choice Why
Alignment BM25 (sparse) 0.915 MRR on 116 labeled queries. Regulatory terminology is stable across years—keyword matching outperformed embeddings.
Chunking Parent-child Best MRR (0.862) while preserving document hierarchy. 53 chunks vs 545 for fixed-size baseline.
Classification Gemini 2.5 Flash Consistent taxonomy output. 100% valid JSON on classification calls.
Thresholds MATCH=15.0, NO_MATCH=5.0 Tuned on ground truth—scores above 15 reliably indicate same content.

Validation Results

Validated against law firm analyses (King & Spalding cybersecurity and software premarket reviews).

Document Pair Gap Detection Type Accuracy
Cybersecurity 2014→2023 9 years 6/6 (100%) 6/6 (100%)
PCCP AI 2023→2024 1.5 years 9/10 (90%) 9/9 (100%)
Software Premarket 2005→2023 18 years 3/4 (75%) 2/3 (67%)

Modern FDA documents (post-2010) with numbered section headers achieve 90-100% detection. Older documents with inconsistent formatting are harder—75% detection on the 2005 software guidance.

Change Taxonomy

The system classifies changes into eight types, separating what the LLM determines (matched content) from what alignment determines (unmatched content):

LLM-Classified (Matched Pairs)

  • STRICTER — Requirements more stringent
  • MORE_LENIENT — Requirements relaxed
  • EXPANDED — Same topic, more detail
  • CLARIFICATION — Same meaning, clearer
  • RESTRUCTURED — Content moved
  • EDITORIAL — Cosmetic only

Alignment-Determined (No Match)

  • NEW — Content only in newer document
  • REMOVED — Content only in older document

Architecture

PDF → FDAChunker (parent_child) → Chunks
                                           ↓
Old chunks + New chunks → BM25Index.align_chunks() → ChunkMatches
                                                          ↓
ChunkMatches → LLM classification → ClassifiedChanges → JSON

Interested in learning more?

Check out my other projects or get in touch.