(Paving the Way for Global Health Equity with Generative AI and Learning Health Systems)

Dear Friends,
Although 2026 has placed AI at a crossroads, the new year has already seen rapid progress in clinical research and real-world applications of generative AI and Learning Health Systems (LHS). Encouraging results are emerging from all directions.
New Evidence for Clinical Applications of GenAI:
New LLMs, AI Agents, and AI Tools:
Performance Improvements in LLMs and GenAI:
Convergence of GenAI and LHS:
LLM and GenAI Benchmarking and Evaluation:
AI-Accelerated Bioscience Breakthroughs:
Unfortunately, prospective evidence using real-world data from real clinical settings remains limited. I continue to advocate for every clinician to evaluate the benefits of GenAI in improving early diagnosis and treatment of complex, uncommon, or rare diseases (CURDs). This is both practical for clinicians and essential to making AI-enabled healthcare a reality.
At the ELHS Institute, we would be happy to show you how easy it is to use GenAI in routine care to generate new clinical evidence suitable for publication. Please feel free to contact us.
Please continue reading my conversation with the ChatGPT copilot about the exciting sleep-focused LLM and how it may help detect cognitive concerns during sleep.
Warm regards,
AJ
AJ Chen, PhD
Founder and PI, ELHS Institute
Silicon Valley, USA
ELHS GenAI Platform: https://elhsi.com
ELHS Newsletters: https://elhsi.org/Newsletters
~

From Page Mill
(Recent papers, news, and events showcasing the progress of GenAI and LHS)
Gommers J, Hernström V, Josefsson V, et al. Interval cancer, sensitivity, and specificity comparing AI-supported mammography screening with standard double reading without AI in the MASAI study: a randomised, controlled, non-inferiority, single-blinded, population-based, screening-accuracy trial. Lancet. 2026 Jan 31;407(10527):505-514.
[2026/1] In this Swedish randomised, controlled, non-inferiority, single-blinded, population-based screening accuracy trial, participants were allocated in a 1:1 ratio to either AI-supported mammography screening (the intervention group) or standard double reading without AI (the control group). AI was used to triage examinations to single or double reading by radiologists and for detection support. Interval cancer rates were 1·55 (95% CI 1·23-1·92) and 1·76 (1·42-2·15) per 1000 participants in the intervention and control group respectively, a non-inferior proportion ratio of 0·88 (95% CI 0·65-1·18; p=0·41).
-
Tao X, Zhou S, Ding K, et al. An LLM chatbot to facilitate primary-to-specialist care transitions: a randomized controlled trial. Nat Med. 2026 Jan 19.
[2026/1] The trial met its primary end points with the PreA-only group showing significantly reduced physician consultation duration (28.7% reduction; 3.14 ± 2.25 min) compared to the No-PreA group (4.41 ± 2.77 min; P < 0.001), alongside significant improvements in physician-perceived care coordination (mean scores 113.1% increase; 3.69 ± 0.90 versus 1.73 ± 0.95; P < 0.001) and patient-reported communication ease (mean scores 16.0% increase; 3.99 ± 0.62 versus 3.44 ± 0.97; P < 0.001).
-
Petridis, P., Margaritis, G., Stoumpou, V. et al. Holistic AI in medicine; improved performance and explainability. npj Digit. Med. 9, 120 (2026).
[2026/1 mit] To address these limitations, we introduce xHAIM (Explainable HAIM), a novel framework leveraging Generative AI to enhance both prediction and explainability through four structured steps: (1) automatically identifying task-relevant patient data across modalities, (2) generating comprehensive patient summaries, (3) using these summaries for improved predictive modeling, and (4) providing clinical explanations by linking predictions to patient-specific medical knowledge. Evaluated on the HAIM-MIMIC-MM dataset, xHAIM improves average AUC from 79.9% to 91.3% across chest pathology and operative tasks.
-
Thapa R, Kjaer MR, He B, et al. A multimodal sleep foundation model for disease prediction. Nat Med. 2026 Jan 6.
[2026/1] To address these challenges, we developed SleepFM, a multimodal sleep foundation model trained with a new contrastive learning approach that accommodates multiple PSG configurations. Trained on a curated dataset of over 585,000 hours of PSG recordings from approximately 65,000 participants across several cohorts, SleepFM produces latent sleep representations that capture the physiological and temporal structure of sleep and enable accurate prediction of future disease risk. From one night of sleep, SleepFM accurately predicts 130 conditions with a C-Index of at least 0.75 (Bonferroni-corrected P < 0.01), including all-cause mortality (C-Index, 0.84), dementia (0.85), myocardial infarction (0.81), heart failure (0.80), chronic kidney disease (0.79), stroke (0.78) and atrial fibrillation (0.78).
-
Tian J, Fard P, Cagan C, et al. An autonomous agentic workflow for clinical detection of cognitive concerns using large language models. NPJ Digit Med. 2026 Jan 7;9(1):51.
[2026/1] Early detection of cognitive impairment is limited by traditional screening tools and resource constraints. We developed two large language model workflows for identifying cognitive concerns from clinical notes: (1) an expert-driven workflow with iterative prompt refinement across three LLMs (LLaMA 3.1 8B, LLaMA 3.2 3B, Med42 v2 8B), and (2) an autonomous agentic workflow coordinating five specialized agents for prompt optimization. Using Llama3.1, we optimized on a balanced refinement dataset and validated on an independent dataset reflecting real-world prevalence. The agentic workflow achieved comparable validation performance (F1 = 0.74 vs. 0.81) and superior refinement results (0.93 vs. 0.87) relative to the expert-driven workflow. Sensitivity decreased from 0.91 to 0.62 between datasets, demonstrating the impact of prevalence shift on generalizability. Expert re-adjudication revealed 44% of apparent false negatives reflected clinically appropriate reasoning. These findings demonstrate that autonomous agentic systems can approach expert-level performance while maintaining interpretability, offering scalable clinical decision supports.
-
Bedi S, Cui H, Fuentes M, et al. Holistic evaluation of large language models for medical tasks with MedHELM. Nat Med. 2026 Jan 20.
[2026/1] Here we introduce MedHELM, an extensible evaluation framework with three contributions. First, a clinician-validated taxonomy organizing medical AI applications into five categories that mirror real clinical tasks—clinical decision support (diagnostic decisions, treatment planning), clinical note generation (visit documentation, procedure reports), patient communication (education materials, care instructions), medical research (literature analysis, clinical data analysis) and administration (scheduling, workflow coordination). These encompass 22 subcategories and 121 specific tasks reflecting daily medical practice. Second, a comprehensive benchmark suite of 37 evaluations covering all subcategories. Third, systematic comparison of nine frontier LLMs—Claude 3.5 Sonnet, Claude 3.7 Sonnet, DeepSeek R1, Gemini 1.5 Pro, Gemini 2.0 Flash, GPT-4o, GPT-4o mini, Llama 3.3 and o3-mini—using an automated LLM-jury evaluation method.
-
Del Fiol G, Borsato E, Bradshaw RL, et al. GARDE-Chat: a scalable, open-source platform for building and deploying health chatbots. J Am Med Inform Assoc. 2026 Jan 10:ocaf211.
[2026/1] GARDE-Chat is an open-source platform designed to facilitate the development, deployment, and dissemination of chatbot-based digital health interventions across different domains and settings. Since its first release in July 2022, GARDE-Chat has supported the development of chatbot-based interventions tested in multiple studies, including large pragmatic clinical trials addressing topics such as genetic testing, COVID-19 testing, tobacco cessation, and cancer screening.
-
Curcin V, Delaney B, Alkhatib A, et al. Learning Health Systems provide a glide path to safe landing for AI in health. Artif Intell Med. 2026 Mar;173:103346.
[2025/12] The paper outlines practical steps to align AI with LHS frameworks, including investment in data infrastructure, continuous model monitoring, and fostering a learning culture. Embedding AI in LHSs transforms implementation from a one-time event into a sustained, evidence-based learning process that aligns innovation with clinical realities, ultimately advancing patient care, health equity, and system resilience.
-
Avsec Ž, Latysheva N, Cheng J, Novati G, et al. Advancing regulatory variant effect prediction with AlphaGenome. Nature. 2026 Jan;649(8099):1206-1218.
[2026/1] We present AlphaGenome, a unified DNA sequence model, which takes as input 1 Mb of DNA sequence and predicts thousands of functional genomic tracks up to single-base-pair resolution across diverse modalities. The modalities include gene expression, transcription initiation, chromatin accessibility, histone modifications, transcription factor binding, chromatin contact maps, splice site usage and splice junction coordinates and strength.
-
Truhn D, Azizi S, Zou J, et al. Artificial intelligence agents in cancer research and oncology. Nat Rev Cancer. 2026 Jan 12.
[2026/1] Here we provide a primer on AI agents for cancer researchers and oncologists. We illustrate how this technology is set apart from and goes beyond traditional AI systems. We discuss existing and emerging applications in cancer research and address real-world challenges from the perspective of academic, clinical and industrial research.
~

In the Eyes of GPTs
(AJ in conversation with ChatGPT to understand how machines view human healthcare)
Now let’s discuss Stanford’s SleepFM study.
ChatGPT
📘 Overview of the SleepFM Study (Thapa et al., Nature Medicine, 2026)
Study Goal:
The paper reports the development of SleepFM, a multimodal sleep foundation model that learns comprehensive physiological patterns from overnight sleep recordings (polysomnography or PSG) and uses them to predict future disease risk.
Data & Training:
Methodology:
🔍 Key Findings
Predictive Performance:
Notable High Performers:
Independent reports highlighted particularly strong performance for conditions such as Parkinson’s disease (C-Index ~0.89) and prostate & breast cancers (C-Index ~0.87-0.89).
Physiological Insights:
🧠 What This Means
Unbelievable! How can it predict neurological diseases like Alzheimer’s, dementia, and Parkinson’s with such high accuracy? What key data are required to make these predictions?
ChatGPT
It does feel unbelievable at first glance — but mechanistically it is plausible. Let’s break it down carefully.
🧠 Why Can SleepFM Predict Neurological Diseases?
Sleep is not just “rest.” It is a high-bandwidth neurophysiological stress test of the brain.
Neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, and other forms of Dementia begin 10–20 years before clinical diagnosis. During that preclinical phase, subtle dysfunction already affects:
All of these are measurable during sleep.
🔬 What Physiological Signals Contain Predictive Information?
SleepFM uses full polysomnography (PSG) — not just sleep duration or apnea index.
1️. EEG (Brain Activity) — The Most Important Signal
EEG provides:
In Alzheimer’s:
In Parkinson’s:
These changes occur years before symptoms.
2️. Autonomic Signals (ECG + Respiratory)
These include:
Neurodegeneration affects:
SleepFM likely captures multimodal coupling patterns, not single features.
3️. EMG (Muscle Tone)
Especially critical for Parkinson’s risk:
RBD can precede Parkinson’s by 10–15 years.
🧩 Why a Foundation Model Works So Well
Traditional sleep metrics:
SleepFM instead learns:
This is analogous to how LLMs detect semantic structure beyond keywords.
The model likely identifies distributed weak signals that humans cannot manually quantify.
📊 What Data Are Required to Make Predictions?
Based on the paper design:
Required Inputs:
Not Required:
It predicts future diagnosis from longitudinal EHR linkage, not from current symptoms.
🚨 Important Caveats
High C-index ≠ clinical readiness.
Key questions:
Remember:
Sleep architecture strongly correlates with age — and age is the biggest risk factor for neurodegeneration.
💡 Big Picture Insight
Sleep may function as a whole-brain stress test, compressing:
Into measurable signals.
In that sense, sleep is a biological “language” — and SleepFM is a foundation model trained to read it.
I hope I will have a chance to try it soon. Forget about fortune tellers — SleepFM may be able to tell me my health fortune instead.
~
🔹 ELHS Institute Resources 🔹
▶️ ELHS Videos
👉 For Clinical AI technology support, contact us at support@elhsi.org 📩
~ the end ~
Democratizing GenAI and LHS to Advance Global Health Equity
info@elhsi.org
Palo Alto, California, USA
