Newsletters

Your E-mail *

Healthcare GenAI at a Turning Point: From Hype to Real-World Evidence

 

ELHS Monthly Newsletter (April 2026)

Subscribe | All Issues

 

 

 

Dear Friends,

 

Over the past month, several studies have highlighted the limitations of LLMs when tested in more realistic clinical settings, raising questions about whether generative AI (GenAI) is ready for routine clinical care. I expect general opinion may soon swing from overly optimistic to overly pessimistic. As a result, there will be greater focus on evaluation and evidence in real-world clinical settings. This is a necessary phase that the healthcare GenAI revolution must go through before it can deliver on its promise.

Simply put, healthcare must generate clinical evidence for each task in routine workflows if GenAI is to improve those tasks—for example, enabling earlier diagnosis of rare diseases. This is where my work has been focused. Traditional approaches to evidence generation are too slow and expensive, so new system-level solutions are needed.

As AI evaluation enters real clinical contexts, recent progress includes the following studies (see details in references in the next section):

  • EMR-integrated LLMs show high guideline alignment and low hallucination. However, clinician adoption remains limited, and harmful outputs persist in low-resource settings. Adoption is the bottleneck—impact depends on human–AI interaction, not just model quality.
  • Clinical AI for breast cancer screening is reaching readiness: it improves detection, efficiency, and fairness, but still requires recalibration under real-world distribution shifts.
  • Harvard experts propose overseeing clinical GenAI through a familiar model: mastering foundational knowledge, demonstrating clinical competence, and continuous learning and oversight.
  • A new framework (CES) simulates dynamic, stateful care environments, shifting evaluation from static benchmarks to workflow- and task-level assessment. AI is evaluated as a collaborator over time, not a one-step predictor.
  • A JAMA viewpoint outlined a coordinated system for pediatric care of rare genetic diseases, where health systems integrate genetics into early diagnostics to inform therapies, clinicians transition from reactive watchful waiting to proactive phenotype- and genotype-informed management, and embedded decision support guides testing, surveillance, and care coordination.
  • New FDA draft guidance enables platform-based approval for individualized therapies for rare genetic diseases, emphasizing post-market real-world evidence and lifecycle oversight—a shift from pre-approval to continuous validation in practice.

🔑 Takeaway
The future is not just better models—it is systems that learn continuously in care. This requires embedded evaluation in workflows, continuous monitoring and recalibration, and scalable real-world evidence generation.

👉 GenAI + Learning Health Systems are emerging as the architecture to deliver equitable, predictive precision healthcare at scale.

Our vision paper on accelerating clinical evidence generation, co-authored with Stanford and Harvard professors and presented at last year’s Chen Institute AI for Accelerating Science Symposium, is now available online. In the final section, I share a conversation with a ChatGPT copilot on using rare diseases as a testbed to demonstrate the full potential of GenAI.

Keep generating evidence,

Warm regards,
AJ

AJ Chen, PhD
Founder & PI, ELHS Institute
Silicon Valley, USA

https://elhsi.org/Newsletters
https://elhsi.com

 

~

 

 

From Page Mill

(Recent papers, news, and events showcasing the progress of GenAI and LHS) 

 

Agweyu, A., Mwaniki, P., Musau, W. et al. Safety of a large language model-based clinical decision support system in African primary healthcare. Nat. Health (2026).

[2026/3] Here we conducted a retrospective evaluation of an electronic medical record-embedded large language model clinical decision support system deployed across 16 primary care clinics in Kenya, between July and September 2024. A panel of trained physicians reviewed 1,469 records. Hallucinations were uncommon, occurring in 50 encounters (3.4%, 95% confidence interval (CI) 2.5–4.5), and most often involved misexpanded acronyms or drug names. Clinical management guidance aligned with local guidelines in almost all cases (1,455; 99%, 95% CI 98.4–99.5). Despite this, clinicians did not modify documentation in 917 encounters (62%, 95% CI 59.9–64.9). Safety assessments identified actively harmful recommendations from the large language model in 115 encounters (7.8%, 95% CI 6.5–9.3), with 67 such recommendations appearing in the final documentation. Conversely, risk present in the clinician’s initial notes was fully mitigated in 118 encounters (8.0%, 95% CI 6.7–9.5 overall; 12.1%, 95% CI 9.5–15.2 of amended cases). Overall, the tool showed strong potential to support quality improvement, but the asymmetric adoption of harmful versus beneficial outputs underscores the need for usability optimization, local guardrails and prospective trials to confirm patient-level benefit.

Kelly CJ, Wilson M, Warren LM, et at. Diagnostic accuracy, fairness and clinical implementation of AI for breast cancer screening: results of multicenter retrospective and prospective technical feasibility studies. Nat Cancer. 2026 Mar;7(3):494-506.

[2026/3] Artificial intelligence (AI) promises to enhance breast cancer screening. Here we evaluated Google’s mammography AI system (version 1.2) across two phases: a retrospective study using 115,973 mammograms from five National Health Service screening services with 39-month follow-up and prospective noninterventional feasibility deployment at 12 sites (9,266 cases). The primary endpoint was AI sensitivity and specificity versus first reader using a 5% noninferiority margin. The secondary endpoints were performance versus second or consensus readers and breast-level analyses. Retrospectively, AI achieved superior sensitivity (0.541 versus 0.437 for first reader, P < 0.001) and noninferior specificity (0.943 versus 0.952, P < 0.001). Cancer detection rate increased from 7.54 to 9.33 per 1,000 women, with AI detecting 25.0% of interval cancers. Performance was particularly strong for first screens (39.3% fewer recalls, 8.8% higher detection) and invasive cancers. No systematic demographic disparities were observed. Simulated second-reader replacement reduced reading time by 32% while increasing detection by 17.7%. Prospective deployment confirmed technical feasibility but revealed a distribution shift requiring threshold recalibration. Implementation requires adaptive calibration and continuous monitoring to ensure safety and equity.

Patel B, Blumenthal D. A Novel Approach to Overseeing the Clinical Application of Generative AI. JAMA Health Forum. 2026;7(3):e256947.

[2026/3] We suggest overseeing clinical GAI through a system analogous to the preparation, evaluation, and lifelong professional oversight of human clinicians. This is not a regulatory scheme but an oversight mechanism. These include the following: Mastering foundational knowledge, Demonstrating clinical competence, Continuous learning and oversight. The implementation of such a clinical GAI development and oversight system would require new authorities and institutions as well as new funding mechanisms.

Luo L, Kim SE, Zhang X, et al. A clinical environment simulator for dynamic AI evaluation. Nat Med. 2026 Mar;32(3):820-827.

[2026/3] We propose the Clinical Environment Simulator (CES), a framework that evaluates clinical LLMs within digital hospital environments where every decision dynamically alters future states. Unlike current benchmarks, the CES framework requires clinical LLMs to execute decisions through realistic electronic health record interfaces, while managing trade-offs between individual patient optimization and system-wide efficiency.

Ostrer H. When Rare Is Common—Rethinking Pediatric Genetic Care. JAMA. 2026;335(15):1293–1294.

[2026/3 view] The transformation of pediatric genetic care from a niche subspecialty to a core health system responsibility requires more than technological advancement; it demands a coordinated mandate across the medical ecosystem. Health systems must integrate genetics into early diagnostics to inform therapies, supported by payers who recognize the value-based necessity of early genomic intervention. As professional societies codify universal standards and laboratories deliver rapid, variant-specific insights, clinicians must transition from reactive watchful waiting to proactive phenotype- and genotype-informed management. Electronic health record systems play a central role in managing the longitudinal needs of children with genetic disease. Embedded decision support can guide testing, surveillance, and care coordination, while standardized workflows reduce missed follow-up. Clearly assigned responsibilities within the electronic health record improve accountability and communication.

Gottlieb S, Kimbrell M. FDA’s Promising New Framework for Rare Genetic Diseases. JAMA Health Forum. 2026;7(3):e261227. doi:10.1001/jamahealthforum.2026.1227

[2026/3] FDA’s New Path for Individualized Therapies. Recognizing this shift, the FDA recently issued draft guidance outlining a regulatory framework for individualized therapies. The guidance addresses treatments designed, developed, and tested for small groups of patients, sometimes for a single individual. These bespoke, N-of-1 therapies are crafted to correct the precise genetic perturbation underlying an individual patient’s disease. Under the new policy, a sponsor may seek approval for a core genetic therapeutic platform. The anchor might be a viral vector engineered to deliver a genetic change to a particular gene, or an oligonucleotide designed to target a gene known to drive a shared disease phenotype—even when the precise variation differs from patient to patient. The FDA’s new draft policy aims to create a pathway in which personalized genetic therapies can advance with far less friction from the laboratory to the clinic and, ultimately, the market.

 

~

 

 

In the Eyes of GPTs

(AJ in conversation with ChatGPT to understand how machines view human healthcare)

 

User  Let’s discuss how to apply GenAI to address the challenges in rare diseases. What are the key challenges in rare diseases?

A white logo on a purple background

Description automatically generated  ChatGPT

Rare diseases are where GenAI could have outsized impact—but only if we’re clear-eyed about the constraints. The challenges are not just “rarity”; they’re structural across the entire healthcare system.

1. Diagnostic Delay and Missed Diagnoses

  • Patients often wait years and see multiple clinicians before diagnosis
  • Symptoms are nonspecific, heterogeneous, and evolving
  • Most clinicians will never encounter many rare diseases in practice

👉 Core issue: knowledge is fragmented and not accessible at the point of care

2. Extreme Data Scarcity

  • Very small patient populations per disease
  • Limited labeled datasets, often biased toward major academic centers
  • Sparse longitudinal data linking symptoms → diagnosis → outcomes

👉 Core issue: traditional ML assumptions (large, clean datasets) break down

3. Phenotypic and Genotypic Heterogeneity

  • Same disease can present very differently across patients
  • Different mutations can lead to similar phenotypes (and vice versa)
  • Clinical data (EHR) is noisy and incomplete

👉 Core issue: pattern recognition is high-dimensional and weakly structured

4. Fragmented Data Across Systems

  • Data spread across hospitals, labs, imaging, genomics
  • Poor interoperability (despite standards like FHIR and Common Data Model)
  • Critical clues buried in unstructured notes

👉 Core issue: no unified patient view

5. Limited Clinical Evidence

  • Few randomized trials; many diseases lack standard-of-care
  • Evidence often based on case reports or small cohorts
  • Difficult to generalize across populations

👉 Core issue: evidence generation is too slow and underpowered

6. Therapeutic Gaps

  • Many rare diseases have no approved treatments
  • Drug development is costly relative to market size
  • Precision therapies (e.g., N-of-1) are emerging but not scalable yet

👉 Core issue: innovation exists, but translation is bottlenecked

7. Health Inequity Is Amplified

  • Diagnosis depends heavily on access to specialists and advanced testing
  • Underserved populations are systematically underdiagnosed
  • Global disparities are large

👉 Core issue: rarity + inequity compounds invisibility

8. Lack of Continuous Learning Systems

  • Each case is treated as isolated, not part of a learning loop
  • No systematic way to aggregate rare disease signals across sites
  • Missed opportunity to learn from every patient

👉 Core issue: the system does not learn from rarity

🔑 Synthesis

Rare diseases expose the fundamental limitations of today’s healthcare system:

Knowledge is sparse, data is fragmented, and learning is not continuous.

That’s exactly why this space is so important:

  • If GenAI + LHS can work here, it can work anywhere
  • Rare diseases are the stress test for:
    • diagnostic intelligence
    • data integration
    • real-world evidence generation
    • equity-by-design systems

 

User  What does the term “precision therapies (e.g., N-of-1)” mean?

A white logo on a purple background

Description automatically generated  ChatGPT

“Precision therapies (e.g., N-of-1)” refers to treatments tailored to a single patient, based on that individual’s specific biology—often their exact genetic mutation.

Break it down:

  • Precision therapy: treatment designed to match a patient’s molecular profile (e.g., a specific gene mutation), rather than a broad disease category.
  • N-of-1 therapy: the most extreme form—a therapy created for one person (sample size = 1).

Concrete example:
A child has a rare genetic disease caused by a unique mutation. Researchers design a custom antisense oligonucleotide that targets that exact mutation—no one else has the same therapy.

This is different from traditional drugs:

  • Traditional: one drug → thousands of patients
  • Precision: one drug → a subgroup of patients
  • N-of-1: one drug → one patient

Why it matters in rare diseases:

  • Many rare diseases are genetically driven and highly specific
  • There may never be enough patients for a standard clinical trial
  • So treatment must shift from “population-based” → individualized

The challenge:

  • Hard to test with traditional trials
  • Expensive and slow to develop
  • Requires new regulatory models (e.g., platform approvals + real-world evidence)

 

User  How can GenAI be used to create a continuous learning system for a specific rare disease? There are many patient-led networks for individual rare diseases—can a GenAI-based approach support these important patient communities?

A white logo on a purple background

Description automatically generated  ChatGPT

You’re pointing at exactly the right leverage point. Rare diseases already have the hardest part solved: motivated, networked patients. What’s missing is a system that can turn their fragmented experiences into continuous, clinically usable learning.

GenAI can help—but only if it’s used to build a learning loop, not just a chatbot.

What a “continuous learning system” actually means

For a specific rare disease, the goal is:

Every patient interaction → structured knowledge → shared learning → improved care for the next patient

That loop has four layers:

 

1. Data capture (from patient networks)

Patient groups already generate rich data:

  • Symptoms, timelines, triggers
  • Treatment attempts and responses
  • Lab reports, genetic results, imaging

GenAI role:

  • Convert unstructured inputs (posts, PDFs, notes) into structured data
  • Normalize language (patient → clinical terms)
  • Extract timelines and key events

👉 This turns scattered experiences into computable patient journeys

 

2. Standardization and integration

Data must be harmonized across patients:

  • Map to clinical concepts (diagnoses, labs, meds)
  • Align with standards like FHIR or OMOP Common Data Model
  • Link patient-reported data with clinical records (when available)

GenAI role:

  • Assist semantic mapping and data cleaning
  • Handle incomplete and noisy inputs
  • Bridge patient language and medical ontologies

👉 This creates a shared data layer across the network

 

3. Knowledge generation (learning from the cohort)

Once data is structured:

  • Identify patterns (symptom clusters, early signals)
  • Detect treatment responses and subgroups
  • Generate hypotheses for diagnosis or therapy

GenAI role:

  • Aggregate cases into structured summaries
  • Compare similar patients (“patients-like-me”)
  • Generate differential diagnoses or risk signals

👉 This transforms data into actionable insights

 

4. Feedback into care (closing the loop)

Insights must return to patients and clinicians:

  • Decision support for earlier diagnosis
  • Suggestions for testing or management
  • Identification of similar cases or trials

GenAI role:

  • Deliver personalized, context-aware reports
  • Update recommendations as new data becomes available
  • Support clinician review and validation

👉 This is where learning becomes real-world impact

 

Why patient networks are uniquely powerful

Patient-led groups solve three problems that institutions struggle with:

  • Scale (they aggregate globally)
  • Longitudinal data (lived experience over years)
  • Engagement (strong motivation to share)

But they lack:

  • Standardization
  • Clinical integration
  • Evidence generation infrastructure

GenAI can bridge exactly these gaps.

 

The critical design principle

The biggest mistake would be building: “an AI tool for patients”

Instead, build: a learning system owned by the patient community, connected to clinicians

That means:

  • Patients contribute data
  • GenAI structures and learns
  • Clinicians validate and act
  • The system improves continuously

Key risks to manage (important)

  • Data quality & bias → need clinician-in-the-loop validation
  • Safety → no direct medical advice without oversight
  • Privacy & governance → patient-controlled data sharing
  • Equity → avoid only serving highly connected populations

Bottom line

Yes—GenAI can support rare disease patient networks, but the real opportunity is bigger:

Turn each rare disease community into a self-improving learning system

 

~

 

🔹 ELHS Institute Resources 🔹

🤖 ELHS GenAI Platform

🌐 ELHS Institute Website

📰 ELHS Monthly Newsletters

▶️ ELHS Videos

👉 For Clinical AI technology support, contact us at support@elhsi.org 📩

 

~ the end ~