Back to guides
6
9 min

Responsible Healthcare AI

Bias, Regulation, Privacy & Accountability

Why Responsible AI Matters More in Healthcare

When a recommendation algorithm on a shopping site gets it wrong, you see an irrelevant product ad. When a healthcare AI gets it wrong, a patient might receive the wrong treatment, miss a critical diagnosis, or be denied care they need. The stakes are categorically different, and so the standards must be higher.

Responsible healthcare AI is not a nice-to-have ethics module tacked onto the end of a course. It is the foundation that determines whether AI in healthcare helps or harms. In this chapter, we will examine the specific ways healthcare AI can go wrong, the regulatory frameworks designed to prevent that, and the technical and organisational practices that make AI trustworthy.

Racial and Demographic Bias in Healthcare AI

Healthcare AI has already produced documented examples of bias that harmed real patients. These are not hypothetical risks — they are failures that have been published in peer-reviewed journals and covered by mainstream media.

Case Study 1: Pulse Oximetry Bias

A pulse oximeter is a small clip placed on a patient's finger that measures blood oxygen saturation (SpO2). It works by shining light through the skin and measuring how much is absorbed by oxygenated vs deoxygenated haemoglobin. It is one of the most common devices in medicine — used in every hospital, ambulance, and increasingly in consumer wearables (Apple Watch, Fitbit).

In 2020, a landmark study in the *New England Journal of Medicine* found that pulse oximeters overestimate oxygen levels in patients with darker skin by 3-8 percentage points. The device might read 95% (normal) when the true level is 88% (dangerously low and requiring supplemental oxygen).

This is not an AI problem per se — it is a sensor calibration problem. But it becomes an AI problem when:

  • AI triage models use SpO2 readings as an input feature, systematically undertriaging Black and Hispanic patients
  • Clinical decision support systems recommend against supplemental oxygen based on falsely reassuring SpO2 readings
  • Deterioration prediction models miss early warning signs in patients with dark skin
  • The FDA issued guidance in 2023 requiring manufacturers to test pulse oximeters on diverse populations, but the installed base of biased devices remains enormous.

    Case Study 2: Dermatology AI on Dark Skin

    AI models that diagnose skin conditions from photographs perform significantly worse on darker skin tones. A 2021 study in *JAMA Dermatology* found that leading dermatology AI tools had accuracy rates of 80-90% on light skin but dropped to 55-70% on dark skin.

    The root cause is training data imbalance. The datasets used to train these models — including widely used research datasets like ISIC (International Skin Imaging Collaboration) — are overwhelmingly composed of images of light-skinned patients. The AI learned what melanoma looks like on white skin but never saw enough examples on dark skin to learn the different visual patterns.

    This has real consequences. Melanoma in Black patients is more often diagnosed at later stages (when survival rates are much lower), and AI tools that cannot detect lesions on dark skin will widen this gap rather than narrow it.

    Case Study 3: The Optum Algorithm

    In 2019, researchers at UC Berkeley published a study in *Science* revealing that a widely used algorithm by Optum (a UnitedHealth Group subsidiary) for identifying patients who need extra care was systematically biased against Black patients. The algorithm used healthcare spending as a proxy for health needs — but Black patients historically spend less on healthcare due to barriers like insurance coverage gaps, distrust of the medical system, and unequal access. As a result, the algorithm identified healthier white patients as higher-need than sicker Black patients.

    At equal levels of illness, Black patients were assigned lower risk scores. The study estimated that fixing this bias would increase the percentage of Black patients flagged for extra care from 17.7% to 46.5%.

    This case illustrates a critical principle: the choice of proxy variable determines whether the AI is fair. Healthcare spending is not the same as healthcare need. An AI system that conflates the two inherits and amplifies the structural inequities of the healthcare system it was trained on.

    Case Study 4: Chest X-Ray AI and Underserved Populations

    A 2022 study in *Nature Medicine* found that chest X-ray AI models trained primarily on data from academic medical centres performed significantly worse on images from community hospitals, rural clinics, and safety-net hospitals (hospitals that serve a high proportion of uninsured and Medicaid patients). The X-ray machines, image quality, patient positioning, and disease prevalence all differ between these settings.

    An AI model that works beautifully at Massachusetts General Hospital may fail at a community health centre in rural Appalachia — and the patients at that health centre are the ones who need diagnostic AI the most.

    > Look at data/bias-case-studies.json for the detailed bias case studies and mitigation strategies used in the sandbox exercises.

    FDA Regulatory Pathway for AI/ML

    The US Food and Drug Administration has been the global leader in regulating AI as a medical device. As of 2025, the FDA has authorised over 900 AI/ML-enabled devices — primarily in radiology (75%), cardiology (10%), and ophthalmology (5%).

    The Three Pathways

    PathwayWhen UsedReview LevelTimelineExample
    510(k)AI device is "substantially equivalent" to an existing cleared deviceModerate3-6 monthsA new chest X-ray AI that is similar to an already-cleared product
    De NovoNovel AI device with no predicate — low to moderate riskModerate-High6-12 monthsViz.ai's stroke detection (first of its kind)
    PMA (Premarket Approval)High-risk devices (Class III)Highest1-3 yearsAI-guided surgical robots, closed-loop insulin delivery

    The Predetermined Change Control Plan (PCCP)

    Traditional medical device regulation assumes that a device is fixed — you validate it, clear it, and it does not change. But AI models are designed to learn and improve. A model trained on 100,000 images today might be retrained on 500,000 images next year, with different performance characteristics.

    The FDA's 2023 guidance on Predetermined Change Control Plans allows manufacturers to pre-specify the types of changes their AI will undergo (new training data, algorithm updates, expanded indications) and the validation protocols they will follow. If the changes fall within the approved plan, the manufacturer does not need to submit a new 510(k) for each update.

    This is a genuinely novel regulatory framework — no other product category has anything like it. It acknowledges that AI is fundamentally different from traditional devices and needs a regulatory model that accommodates continuous improvement.

    EU AI Act: High-Risk Classification

    The European Union's AI Act, which began phased implementation in 2025, takes a different approach from the FDA. Instead of regulating AI as a medical device (product-specific), the EU AI Act regulates AI by risk category (horizontal regulation).

    Healthcare AI is classified as high-risk, which triggers mandatory requirements:

    RequirementWhat It MeansPractical Implication
    Risk management systemContinuous process to identify and mitigate risksMust document all known risks, including bias, and show how they are addressed
    Data governanceTraining data must be relevant, representative, and free of errorsMust demonstrate diversity of training data across demographics
    Technical documentationDetailed description of the AI system, its purpose, and its limitationsFull model card: architecture, training data, performance metrics, known failure modes
    Record-keepingAutomatic logging of AI system operationEvery prediction, recommendation, and alert must be logged and traceable
    Human oversightHumans must be able to understand, monitor, and override AI decisionsClinical AI cannot operate as a fully autonomous decision-maker; a "human in the loop" is mandatory
    Accuracy and robustnessAI must meet declared performance levels and be resilient to adversarial inputsMust test against edge cases, data drift, and intentional manipulation

    Non-compliance with the EU AI Act can result in fines of up to 35 million EUR or 7% of global annual turnover — whichever is higher.

    HIPAA De-Identification: Safe Harbor Method

    Training AI models requires data. In US healthcare, patient data is protected by HIPAA. To use patient data for AI development without individual patient consent, the data must be de-identified — stripped of all information that could identify a specific person.

    HIPAA defines two methods for de-identification. The more commonly used is the Safe Harbor method, which requires removal of 18 specific identifiers:

    CategoryIdentifiers to Remove
    Direct identifiersName, address (below state level), dates (below year, except age >89), phone number, fax number, email, SSN, medical record number, health plan number, account number, certificate/licence number
    Vehicle/device identifiersVehicle serial numbers, device identifiers
    Digital identifiersWeb URLs, IP addresses, biometric identifiers, full-face photos
    OtherAny other unique identifying number, characteristic, or code

    Additionally, the covered entity must have no actual knowledge that the remaining information could identify an individual.

    The Re-Identification Risk

    De-identification is not as simple as removing names and dates. Research has shown that:

  • 87% of the US population can be uniquely identified by ZIP code + date of birth + gender alone
  • Rare diseases, unusual combinations of diagnoses, or patients in small geographic areas can be re-identified even from "de-identified" datasets
  • Genomic data is inherently identifying — your DNA sequence is unique to you
  • AI developers must use additional protections beyond Safe Harbor:

  • K-anonymity — ensure that each combination of quasi-identifiers (age range, ZIP code, gender) appears in at least K records in the dataset
  • Differential privacy — add mathematical noise to the data so that the inclusion or exclusion of any single patient does not meaningfully change the AI model's outputs
  • Federated learning — train the AI model across multiple hospitals without the data ever leaving each hospital's servers. The model travels to the data, not the data to the model.
  • > Look at data/deidentification-checklist.json for the Safe Harbor compliance checklist and re-identification risk assessment used in the sandbox exercises.

    Algorithmic Accountability

    Who is responsible when a healthcare AI makes an error? This question does not have a clean answer, and the legal and ethical frameworks are still evolving.

    The Accountability Stack

    LayerWho Is ResponsibleFor What
    AI developerCompany that built the modelTraining data quality, model validation, known limitations documented
    Deploying institutionHospital or health systemAppropriate use, clinical workflow integration, monitoring performance in their population
    ClinicianPhysician or nurse using the AIFinal clinical decision — AI recommendations are advisory, not deterministic
    RegulatorFDA, EU notified bodiesClearance/approval based on submitted evidence; post-market surveillance

    In current US law, the physician retains ultimate liability for clinical decisions. An AI recommendation does not transfer responsibility. But this creates a tension: if a physician overrides a correct AI recommendation and the patient is harmed, was the physician negligent? If a physician follows an incorrect AI recommendation, is the AI developer liable?

    These questions are working their way through courts and legislatures. In the interim, best practices include:

  • Transparency — Always show the clinician why the AI made a recommendation (explainability), not just what it recommended
  • Override documentation — When a clinician overrides an AI recommendation, the system should prompt them to document the reason
  • Continuous monitoring — Track AI performance in production, stratified by patient demographics, to detect drift and emerging bias
  • Incident reporting — Establish a clear process for reporting AI errors, similar to existing adverse event reporting (MedWatch in the US, Yellow Card in the UK)
  • A Framework for Evaluating Healthcare AI

    Before deploying or purchasing a healthcare AI tool, ask these five questions:

  • What data was it trained on? Does the training population match your patient population in terms of demographics, disease prevalence, and clinical setting?
  • How was it validated? Was it tested on an external dataset (different from the training data), and did that external dataset include diverse populations?
  • What are its known failure modes? Every AI has blind spots. Does the vendor disclose them? Are they documented in the model card?
  • How does it integrate into the clinical workflow? Does it appear at the right time, with enough context, and with an easy override mechanism?
  • How is it monitored post-deployment? Is there a plan to track performance over time, detect drift, and retrain when needed?
  • If the vendor cannot answer these questions clearly, the tool is not ready for clinical use.

    Key Takeaways

  • Bias in healthcare AI is not hypothetical — it is documented and harmful — pulse oximetry, dermatology AI, and risk prediction algorithms have all shown measurable bias against patients with dark skin and Black patients specifically
  • The FDA has authorised 900+ AI devices but is building new frameworks — the Predetermined Change Control Plan is a regulatory innovation that accommodates AI's ability to learn and improve
  • The EU AI Act classifies healthcare AI as high-risk — requiring risk management, data governance, human oversight, and continuous monitoring, with fines up to 35 million EUR for non-compliance
  • HIPAA de-identification is necessary but not sufficient — Safe Harbor removes 18 identifiers, but re-identification is still possible without additional protections like differential privacy and federated learning
  • Algorithmic accountability is shared but physician liability remains — AI recommendations are advisory, and the clinician bears ultimate responsibility for clinical decisions
  • This is chapter 6 of AI for Healthcare (Western).

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details