(Paving the Way for Global Health Equity with Generative AI and Learning Health Systems)
Dear Friends,
Over the past month, several studies have highlighted the limitations of LLMs when tested in more realistic clinical settings, raising questions about whether generative AI (GenAI) is ready for routine clinical care. I expect general opinion may soon swing from overly optimistic to overly pessimistic. As a result, there will be greater focus on evaluation and evidence in real-world clinical settings. This is a necessary phase that the healthcare GenAI revolution must go through before it can deliver on its promise.
Simply put, healthcare must generate clinical evidence for each task in routine workflows if GenAI is to improve those tasks—for example, enabling earlier diagnosis of rare diseases. This is where my work has been focused. Traditional approaches to evidence generation are too slow and expensive, so new system-level solutions are needed.
As AI evaluation enters real clinical contexts, recent progress includes the following studies (see details in references in the next section):
🔑 Takeaway
The future is not just better models—it is systems that learn continuously in care. This requires embedded evaluation in workflows, continuous monitoring and recalibration, and scalable real-world evidence generation.
👉 GenAI + Learning Health Systems are emerging as the architecture to deliver equitable, predictive precision healthcare at scale.
Our vision paper on accelerating clinical evidence generation, co-authored with Stanford and Harvard professors and presented at last year’s Chen Institute AI for Accelerating Science Symposium, is now available online. In the final section, I share a conversation with a ChatGPT copilot on using rare diseases as a testbed to demonstrate the full potential of GenAI.
Keep generating evidence,
Warm regards,
AJ
AJ Chen, PhD
Founder & PI, ELHS Institute
Silicon Valley, USA
https://elhsi.org/Newsletters
https://elhsi.com
~

From Page Mill
(Recent papers, news, and events showcasing the progress of GenAI and LHS)
Agweyu, A., Mwaniki, P., Musau, W. et al. Safety of a large language model-based clinical decision support system in African primary healthcare. Nat. Health (2026).
[2026/3] Here we conducted a retrospective evaluation of an electronic medical record-embedded large language model clinical decision support system deployed across 16 primary care clinics in Kenya, between July and September 2024. A panel of trained physicians reviewed 1,469 records. Hallucinations were uncommon, occurring in 50 encounters (3.4%, 95% confidence interval (CI) 2.5–4.5), and most often involved misexpanded acronyms or drug names. Clinical management guidance aligned with local guidelines in almost all cases (1,455; 99%, 95% CI 98.4–99.5). Despite this, clinicians did not modify documentation in 917 encounters (62%, 95% CI 59.9–64.9). Safety assessments identified actively harmful recommendations from the large language model in 115 encounters (7.8%, 95% CI 6.5–9.3), with 67 such recommendations appearing in the final documentation. Conversely, risk present in the clinician’s initial notes was fully mitigated in 118 encounters (8.0%, 95% CI 6.7–9.5 overall; 12.1%, 95% CI 9.5–15.2 of amended cases). Overall, the tool showed strong potential to support quality improvement, but the asymmetric adoption of harmful versus beneficial outputs underscores the need for usability optimization, local guardrails and prospective trials to confirm patient-level benefit.
Kelly CJ, Wilson M, Warren LM, et at. Diagnostic accuracy, fairness and clinical implementation of AI for breast cancer screening: results of multicenter retrospective and prospective technical feasibility studies. Nat Cancer. 2026 Mar;7(3):494-506.
[2026/3] Artificial intelligence (AI) promises to enhance breast cancer screening. Here we evaluated Google’s mammography AI system (version 1.2) across two phases: a retrospective study using 115,973 mammograms from five National Health Service screening services with 39-month follow-up and prospective noninterventional feasibility deployment at 12 sites (9,266 cases). The primary endpoint was AI sensitivity and specificity versus first reader using a 5% noninferiority margin. The secondary endpoints were performance versus second or consensus readers and breast-level analyses. Retrospectively, AI achieved superior sensitivity (0.541 versus 0.437 for first reader, P < 0.001) and noninferior specificity (0.943 versus 0.952, P < 0.001). Cancer detection rate increased from 7.54 to 9.33 per 1,000 women, with AI detecting 25.0% of interval cancers. Performance was particularly strong for first screens (39.3% fewer recalls, 8.8% higher detection) and invasive cancers. No systematic demographic disparities were observed. Simulated second-reader replacement reduced reading time by 32% while increasing detection by 17.7%. Prospective deployment confirmed technical feasibility but revealed a distribution shift requiring threshold recalibration. Implementation requires adaptive calibration and continuous monitoring to ensure safety and equity.
Patel B, Blumenthal D. A Novel Approach to Overseeing the Clinical Application of Generative AI. JAMA Health Forum. 2026;7(3):e256947.
[2026/3] We suggest overseeing clinical GAI through a system analogous to the preparation, evaluation, and lifelong professional oversight of human clinicians. This is not a regulatory scheme but an oversight mechanism. These include the following: Mastering foundational knowledge, Demonstrating clinical competence, Continuous learning and oversight. The implementation of such a clinical GAI development and oversight system would require new authorities and institutions as well as new funding mechanisms.
Luo L, Kim SE, Zhang X, et al. A clinical environment simulator for dynamic AI evaluation. Nat Med. 2026 Mar;32(3):820-827.
[2026/3] We propose the Clinical Environment Simulator (CES), a framework that evaluates clinical LLMs within digital hospital environments where every decision dynamically alters future states. Unlike current benchmarks, the CES framework requires clinical LLMs to execute decisions through realistic electronic health record interfaces, while managing trade-offs between individual patient optimization and system-wide efficiency.
Ostrer H. When Rare Is Common—Rethinking Pediatric Genetic Care. JAMA. 2026;335(15):1293–1294.
[2026/3 view] The transformation of pediatric genetic care from a niche subspecialty to a core health system responsibility requires more than technological advancement; it demands a coordinated mandate across the medical ecosystem. Health systems must integrate genetics into early diagnostics to inform therapies, supported by payers who recognize the value-based necessity of early genomic intervention. As professional societies codify universal standards and laboratories deliver rapid, variant-specific insights, clinicians must transition from reactive watchful waiting to proactive phenotype- and genotype-informed management. Electronic health record systems play a central role in managing the longitudinal needs of children with genetic disease. Embedded decision support can guide testing, surveillance, and care coordination, while standardized workflows reduce missed follow-up. Clearly assigned responsibilities within the electronic health record improve accountability and communication.
Gottlieb S, Kimbrell M. FDA’s Promising New Framework for Rare Genetic Diseases. JAMA Health Forum. 2026;7(3):e261227. doi:10.1001/jamahealthforum.2026.1227
[2026/3] FDA’s New Path for Individualized Therapies. Recognizing this shift, the FDA recently issued draft guidance outlining a regulatory framework for individualized therapies. The guidance addresses treatments designed, developed, and tested for small groups of patients, sometimes for a single individual. These bespoke, N-of-1 therapies are crafted to correct the precise genetic perturbation underlying an individual patient’s disease. Under the new policy, a sponsor may seek approval for a core genetic therapeutic platform. The anchor might be a viral vector engineered to deliver a genetic change to a particular gene, or an oligonucleotide designed to target a gene known to drive a shared disease phenotype—even when the precise variation differs from patient to patient. The FDA’s new draft policy aims to create a pathway in which personalized genetic therapies can advance with far less friction from the laboratory to the clinic and, ultimately, the market.
~

In the Eyes of GPTs
(AJ in conversation with ChatGPT to understand how machines view human healthcare)
Let’s discuss how to apply GenAI to address the challenges in rare diseases. What are the key challenges in rare diseases?
ChatGPT
Rare diseases are where GenAI could have outsized impact—but only if we’re clear-eyed about the constraints. The challenges are not just “rarity”; they’re structural across the entire healthcare system.
1. Diagnostic Delay and Missed Diagnoses
👉 Core issue: knowledge is fragmented and not accessible at the point of care
2. Extreme Data Scarcity
👉 Core issue: traditional ML assumptions (large, clean datasets) break down
3. Phenotypic and Genotypic Heterogeneity
👉 Core issue: pattern recognition is high-dimensional and weakly structured
4. Fragmented Data Across Systems
👉 Core issue: no unified patient view
5. Limited Clinical Evidence
👉 Core issue: evidence generation is too slow and underpowered
6. Therapeutic Gaps
👉 Core issue: innovation exists, but translation is bottlenecked
7. Health Inequity Is Amplified
👉 Core issue: rarity + inequity compounds invisibility
8. Lack of Continuous Learning Systems
👉 Core issue: the system does not learn from rarity
🔑 Synthesis
Rare diseases expose the fundamental limitations of today’s healthcare system:
Knowledge is sparse, data is fragmented, and learning is not continuous.
That’s exactly why this space is so important:
What does the term “precision therapies (e.g., N-of-1)” mean?
ChatGPT
“Precision therapies (e.g., N-of-1)” refers to treatments tailored to a single patient, based on that individual’s specific biology—often their exact genetic mutation.
Break it down:
Concrete example:
A child has a rare genetic disease caused by a unique mutation. Researchers design a custom antisense oligonucleotide that targets that exact mutation—no one else has the same therapy.
This is different from traditional drugs:
Why it matters in rare diseases:
The challenge:
How can GenAI be used to create a continuous learning system for a specific rare disease? There are many patient-led networks for individual rare diseases—can a GenAI-based approach support these important patient communities?
ChatGPT
You’re pointing at exactly the right leverage point. Rare diseases already have the hardest part solved: motivated, networked patients. What’s missing is a system that can turn their fragmented experiences into continuous, clinically usable learning.
GenAI can help—but only if it’s used to build a learning loop, not just a chatbot.
What a “continuous learning system” actually means
For a specific rare disease, the goal is:
Every patient interaction → structured knowledge → shared learning → improved care for the next patient
That loop has four layers:
1. Data capture (from patient networks)
Patient groups already generate rich data:
GenAI role:
👉 This turns scattered experiences into computable patient journeys
2. Standardization and integration
Data must be harmonized across patients:
GenAI role:
👉 This creates a shared data layer across the network
3. Knowledge generation (learning from the cohort)
Once data is structured:
GenAI role:
👉 This transforms data into actionable insights
4. Feedback into care (closing the loop)
Insights must return to patients and clinicians:
GenAI role:
👉 This is where learning becomes real-world impact
Why patient networks are uniquely powerful
Patient-led groups solve three problems that institutions struggle with:
But they lack:
GenAI can bridge exactly these gaps.
The critical design principle
The biggest mistake would be building: “an AI tool for patients”
Instead, build: a learning system owned by the patient community, connected to clinicians
That means:
Key risks to manage (important)
Bottom line
Yes—GenAI can support rare disease patient networks, but the real opportunity is bigger:
Turn each rare disease community into a self-improving learning system
~
🔹 ELHS Institute Resources 🔹
▶️ ELHS Videos
👉 For Clinical AI technology support, contact us at support@elhsi.org 📩
~ the end ~
Democratizing GenAI and LHS to Advance Global Health Equity
info@elhsi.org
Palo Alto, California, USA
