Medical NLP & Documentation
AI-Powered Clinical Notes, Coding & Compliance
The Documentation Crisis
American physicians spend an average of 16 minutes per patient encounter on documentation — nearly as long as the encounter itself. After clinic hours, many spend another 1-2 hours finishing notes, responding to patient messages, and completing billing paperwork. Physicians call this "pyjama time" — the unpaid after-hours documentation work that is a leading driver of burnout.
The problem is not laziness. The problem is that modern clinical documentation must serve multiple masters simultaneously: the patient's medical record, billing and insurance, quality reporting, legal defensibility, and regulatory compliance. A single office visit for a diabetic patient with hypertension might generate a note that satisfies the patient's care needs, supports a billing code (CPT 99214), meets quality measure documentation requirements (HbA1c screening, foot exam, eye exam referral), and stands up to an insurance audit.
AI-powered natural language processing (NLP) is the most promising technology to break this cycle. NLP is the branch of AI that deals with understanding and generating human language — in this case, the language of clinical medicine.
SOAP Notes: The Universal Format
Clinical notes in the United States and most Western countries follow the SOAP format — a structured framework that organises clinical thinking:
| Section | Stands For | What Goes Here | Example |
|---|---|---|---|
| S | Subjective | What the patient tells you — their symptoms, concerns, history | "I've had a cough for two weeks. It's worse at night. No fever. I quit smoking three years ago." |
| O | Objective | What you observe and measure — vital signs, exam findings, lab results | BP 138/82, HR 76, lungs clear bilaterally, SpO2 98% on room air |
| A | Assessment | Your clinical interpretation — diagnoses, differential diagnoses | Chronic cough, likely post-nasal drip vs GERD. Low suspicion for malignancy given negative chest X-ray and 3-year smoking cessation. |
| P | Plan | What you will do — medications, tests, referrals, follow-up | Start nasal corticosteroid spray. Trial of omeprazole 20mg daily for 4 weeks. Chest X-ray if no improvement. Follow-up in 6 weeks. |
SOAP notes are deceptively simple in structure but complex in execution. A good SOAP note tells the story of the clinical encounter in a way that another physician could pick up the chart six months later and understand exactly what happened and why.
How Ambient AI Documentation Works
The most transformative application of medical NLP is ambient clinical documentation — AI that listens to the doctor-patient conversation and automatically generates a structured SOAP note.
Nuance DAX (Dragon Ambient eXperience) by Microsoft is the market leader. Here is how it works:
Physicians using DAX report saving 5-7 minutes per encounter and spending 50% less time on after-hours documentation. For a primary care physician seeing 20-25 patients daily, that is nearly two hours of reclaimed time.
Other ambient documentation tools include Abridge (used by several large health systems), Suki (voice-enabled AI assistant), and DeepScribe.
> Look at data/soap-note-samples.json for the ambient documentation examples used in the sandbox exercises.
Medical Coding: ICD-10-CM and CPT
Every clinical encounter in the United States must be translated into standardised codes for billing and reporting purposes. This translation — called "medical coding" — is one of the most tedious, error-prone, and consequential steps in healthcare delivery.
ICD-10-CM (Diagnoses)
The International Classification of Diseases, 10th Revision, Clinical Modification is the diagnosis coding system used in the US. It contains over 72,000 codes that describe every conceivable medical condition with extraordinary specificity.
| Code | Description | Level of Detail |
|---|---|---|
| E11 | Type 2 diabetes mellitus | General |
| E11.3 | Type 2 diabetes with ophthalmic complications | Specific |
| E11.311 | Type 2 diabetes with unspecified diabetic retinopathy with macular oedema | Very specific |
| E11.3211 | Type 2 diabetes with mild nonproliferative diabetic retinopathy with macular oedema, right eye | Extremely specific |
Getting the code right matters financially. An under-coded visit leaves revenue on the table. An over-coded visit — billing for a more complex encounter than actually occurred — is considered fraud and can trigger audits, fines, and even criminal prosecution under the False Claims Act.
CPT Codes (Procedures and Services)
Current Procedural Terminology codes describe what the provider did — the service or procedure. For office visits, the key CPT codes are the Evaluation & Management (E/M) codes:
| CPT Code | Visit Level | Typical Scenario | Approximate Reimbursement (Medicare) |
|---|---|---|---|
| 99213 | Low complexity | Follow-up for stable hypertension, medication refill | $92-110 |
| 99214 | Moderate complexity | New symptom workup, medication adjustment, 2+ chronic conditions | $130-155 |
| 99215 | High complexity | Multiple complex problems, extensive counselling, new diagnosis | $175-210 |
AI coding assistants analyse the clinical note and suggest the appropriate ICD-10-CM and CPT codes. They look for documentation elements that support a given code level — for example, the number of chronic conditions addressed, the complexity of medical decision-making, and whether the physician documented time spent on counselling.
3M and Optum are major players in AI-assisted coding, with tools that analyse physician notes and flag coding opportunities (where documentation supports a higher-complexity code than the one selected) and compliance risks (where the documentation does not support the selected code).
Discharge Summaries and Care Transitions
When a patient leaves the hospital, they receive a discharge summary — a document that summarises their hospital stay, diagnoses, procedures, medications at discharge, and follow-up instructions. In the US, the Joint Commission (the primary hospital accreditation body) requires discharge summaries to include:
Discharge summaries are critical for care transitions — when the patient moves from hospital to primary care, skilled nursing facility, or home health. A poorly written discharge summary is a leading cause of preventable readmissions. The patient's primary care physician may not know about a medication change, a new diagnosis, or a pending test result.
AI-generated discharge summaries pull structured data from the EHR — admission notes, daily progress notes, lab results, medication orders — and compile a coherent narrative. The physician reviews and signs the document, but the AI eliminates the manual work of searching through days of records to piece together the story.
An Example: AI-Generated vs Manual
Manual discharge summary (typical):
> "Patient is a 72 y/o male admitted for CHF exacerbation. Diuresed with IV Lasix. Improved. Discharge home on increased oral Lasix. Follow-up with cardiology."
AI-enhanced discharge summary:
> "Mr. Johnson is a 72-year-old male with a history of heart failure with reduced ejection fraction (last EF 30%, echo 3/15/2026), type 2 diabetes, and stage 3 chronic kidney disease, admitted 6/2/2026 for acute decompensated heart failure. He presented with 10 lb weight gain over 2 weeks, progressive dyspnoea on exertion, and bilateral lower extremity oedema.
>
> Hospital course: Diuresed with IV furosemide 80mg BID, net negative 4.2L over 3 days. BNP improved from 1,850 to 620. Creatinine stable at 1.8. Discharge weight 198 lb (dry weight target 195 lb).
>
> Discharge medications: Furosemide increased from 40mg to 60mg daily. All other medications unchanged (see medication reconciliation).
>
> Follow-up: Cardiology in 7 days (appointment scheduled 6/12/2026). Daily weights — call clinic if weight increases >3 lb in 2 days. Sodium restriction <2g/day."
The AI version is not only more complete — it connects the dots between data points that the manual version leaves implicit.
> Look at data/discharge-summary-templates.json for the discharge summary generation templates used in the sandbox exercises.
The Revenue Cycle Impact
Medical NLP does not just save time — it directly impacts hospital revenue. In the US healthcare system, revenue depends on documentation:
AI-powered NLP catches these gaps in real time — flagging missing documentation elements before the note is signed, suggesting more specific diagnosis codes, and ensuring quality measures are captured.
Key Takeaways
This is chapter 3 of AI for Healthcare (Western).
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details