(Paving the Way for Global Health Equity with Generative AI and Learning Health Systems)
Dear Friends,
Generative AI is making a profound—and sometimes unsettling—impact on our society. It may replace or reduce many jobs across different industries. At the same time, it will also create new roles. In healthcare, GenAI may require the creation of new positions in care delivery in order to function optimally and improve care quality and patient outcomes.
Harvard professors have proposed one such role: the Algorithmic Consultant (see Nature paper below). They envision this role as similar to that of clinical pharmacists in hospitals today. An algorithmic consultant would have two primary responsibilities: providing point-of-care consultations with physicians who seek input from AI systems on specific clinical scenarios, and overseeing and managing the hospital’s broader ecosystem of algorithms.
Since I have been studying the convergence of GenAI and the Learning Health System (LHS) vision proposed by the U.S. National Academy of Medicine (NAM), I believe a much larger and deeper structural change is needed in healthcare before GenAI’s full potential can be realized. One such change would be the creation of a new AI Specialist role, analogous to pathologists or radiologists. This specialist would generate AI reports that physicians can use as additional information to support differential diagnoses and personalized treatment planning. The AI Specialist role could be added to existing Medical Informatics departments or placed in a new department dedicated to predictive care.
These AI reports could be applied in multiple clinical settings. Beyond the obvious outpatient and inpatient points of care, they could also be used for automated, daily background monitoring of clinical decisions in diagnosis and treatment. Every specialty department could implement daily monitoring across all conditions within its scope, reminding physicians to consider overlooked disease causes or confirm necessary diagnostic tests. This, in turn, would improve clinical training for young doctors and raise the overall quality of care. Such broad monitoring was recommended in the 2015 NAM report Improving Diagnosis in Health Care—yet few, if any, hospitals have been able to implement it, largely because the resource demands were prohibitive.
In my 2024 review of GenAI democratization in healthcare, I highlight these hospital-wide monitoring tasks as an example of GenAI’s unique potential roles in care delivery. The review also proposes LHS units as a structural guarantee—a concept I first published in Nature Sci. Rep. in 2022.
I’ll be discussing with my copilot, ChatGPT, the emerging roles of GenAI in learning health systems below. I hope you enjoy these machine-human conversations.
Warm regards,
AJ
AJ Chen, PhD
Founder and PI, ELHS Institute
Silicon Valley, USA
🔗 ELHS Newsletters: https://elhsi.org/Newsletters
🔗 ELHS GenAI Platform: https://elhsi.com
~
From Page Mill
(Recent papers, news, and events showcasing the progress of GenAI and LHS)
Marwaha, J.S., Yuan, W., Poddar, M. et al. The algorithmic consultant: a new era of clinical AI calls for a new workforce of physician-algorithm specialists. npj Digit. Med. 8, 552 (2025).
[2025/8] As complex AI systems become more common in clinical decision-making, a new type of physician-data science specialist is needed to bridge the gap between these AI tools and practicing clinicians. Analogous to how clinical pharmacists currently guide appropriate medication use and govern a hospital’s formulary, this specialist would offer point-of-care guidance on AI tool selection and interpretation, and manage a hospital’s AI systems. This role aims to enable safe and effective clinical AI by collaborating with patient-facing providers to ensure appropriate model application in the real world.
Li, R., Wang, X., Berlowitz, D. et al. CARE-AD: a multi-agent large language model framework for Alzheimer’s disease prediction using longitudinal clinical notes. npj Digit. Med. 8, 541 (2025).
[2025/8] In this study, we present CARE-AD (Collaborative Analysis and Risk Evaluation for Alzheimer’s Disease), a multi-agent LLM-based framework for forecasting Alzheimer’s disease (AD) onset by analyzing longitudinal electronic health record (EHR) notes. CARE-AD assigns specialized LLM agents to extract signs and symptoms relevant to AD and conduct domain-specific evaluations—emulating a collaborative diagnostic process. In a retrospective evaluation, CARE-AD achieved higher accuracy (0.53 vs. 0.26–0.45) than baseline single-model approaches in predicting AD risk 10 years prior to the first recorded diagnosis code. These findings highlight the feasibility of using multi-agent LLM systems to support early risk assessment for AD and motivate further research on their integration into clinical decision support workflows.
Khasentino, J., Belyaeva, A., Liu, X. et al. A personal health large language model for sleep and fitness coaching. Nat Med (2025).
[2025/8] Here we introduce the Personal Health Large Language Model (PH-LLM), designed for applications in sleep and fitness. PH-LLM is a version of the Gemini LLM that was finetuned for text understanding and reasoning when applied to aggregated daily-resolution numerical sensor data. In a comprehensive evaluation involving 857 real-world case studies, PH-LLM performed similarly to human experts for fitness-related tasks and improved over the base Gemini model in providing personalized sleep insights. Finally, PH-LLM effectively predicted self-reported sleep quality using a multimodal encoding of wearable sensor data, further demonstrating its ability to effectively contextualize wearable modalities.
Wang, Z., Cao, L., Danek, B. et al. Accelerating clinical evidence synthesis with large language models. npj Digit. Med. 8, 509 (2025).
[2025/8] Here, we propose a generative artificial intelligence (AI) pipeline named TrialMind to streamline study search, study screening, and data extraction tasks in SR. We chose published SRs to build TrialReviewBench, which contains 100 SRs and 2,220 clinical studies. For study search, it achieves high recall rates (Ours 0.711–0.834 v.s. Human baseline 0.138–0.232). For study screening, TrialMind beats previous document ranking methods in a 1.5–2.6 fold change. For data extraction, it outperforms a GPT-4’s accuracy by 16–32%. In a pilot study, human-AI collaboration with TrialMind improved recall by 71.4% and reduced screening time by 44.2%, while in data extraction, accuracy increased by 23.5% with a 63.4% time reduction. Medical experts preferred TrialMind’s synthesized evidence over GPT-4’s in 62.5%-100% of cases. These findings show the promise of accelerating clinical evidence synthesis driven by human-AI collaboration.
Luo, MJ., Bi, S., Pang, J. et al. A large language model digital patient system enhances ophthalmology history taking skills. npj Digit. Med. 8, 502 (2025).
[2025/8] To address this, we developed a large language model-based digital patient (LLMDP) system that transforms de‑identified electronic health records into voice‑enabled virtual patients capable of free‑text dialog and adaptive feedback, based on our previously established open-source retrieval-augmented framework. In a single‑center randomized controlled trial (ClinicalTrials.gov: NCT06229379; N = 84), students trained with LLMDP achieved a 10.50-point increase in medical history-taking assessment scores (95% CI: 4.66–16.33, p < 0.001) compared to those using traditional methods.
Wu, J., Wang, Y., Zhong, Z. et al. Vision-language foundation model for 3D medical imaging. npj Artif. Intell. 1, 17 (2025).
[2025/8] Recent advances in AI, especially vision-language foundation models (VLFMs), show promise in automating radiology report generation from complex 3D medical imaging data. Our review analyzes 23 studies on VLFMs, focusing on model architectures, capabilities, training datasets, and evaluation metrics. We discuss AI’s evolution in radiology, emphasizing the need for diverse datasets and standardized metrics, as challenges remain in producing consistent, high-quality reports.
You JG, Dbouk RH, Landman A, et al. Ambient Documentation Technology in Clinician Experience of Documentation Burden and Burnout. JAMA Netw Open. 2025;8(8):e2528056. doi:10.1001/jamanetworkopen.2025.28056
[2025/8] To examine the prevalence of ADT use and its association with clinicians’ experience of documentation burden and burnout before and after use. A total of 128 of 264 MGB survey respondents (48.5%) had self-reported using ADT for at least 50% of their patient visits, while 27 of 62 Emory respondents (43.5%) self-reported usage for most or all of their visits. The proportion of MGB clinicians reporting burnout (Professional Fulfillment Index overall burnout score >1.33) decreased from 50.6% to 29.4% (χ2 = 42.4; P < .001) at 42 days and decreased from 52.6% to 30.7% (χ2 = 32.7; P < .001) at 84 days. The proportion of Emory clinicians reporting a positive impact of their documentation practice on well-being (defined as a score of 3-4 for positive or very positive) improved from 1.6% to 32.3% (χ2 = 19.0; P < .001). These findings suggest that ADT may enhance clinicians’ documentation-related experience and reduce burnout.
Bedi S, Jiang Y, Chung P, et al. Fidelity of Medical Reasoning in Large Language Models. JAMA Netw Open. 2025;8(8):e2526021. doi:10.1001/jamanetworkopen.2025.26021
[2025/8] We evaluated 6 models spanning different architectures and capabilities: DeepSeek-R1 (model 1), o3-mini (reasoning models) (model 2), Claude-3.5 Sonnet (model 3), Gemini-2.0-Flash (model 4), GPT-4o (model 5), and Llama-3.3-70B (model 6). When forced to reason beyond familiar answer patterns, all models demonstrate declines in accuracy, challenging claims of artificial intelligence’s readiness for autonomous clinical deployment.
Giglio B, Albeloushi A, Alhaj AK, et al. Artificial Intelligence–Augmented Human Instruction and Surgical Simulation Performance: A Randomized Clinical Trial. JAMA Surg. August 06, 2025. doi:10.1001/jamasurg.2025.2564
[2025/8] In this randomized clinical trial of 88 medical students, trainees achieved significantly higher performance scores when tutored by a human educator providing personalized feedback based on artificial intelligence error data than by an intelligent tutor alone. Providing human educators with artificial intelligence performance data to tailor feedback improves learning outcomes in surgical simulation training.
Sahhar M, Nassar JE, Rouffiac A, et al. Enhanced Surgical Instructions Using Large Language Models. JAMA Surg. August 13, 2025. doi:10.1001/jamasurg.2025.2825
[2025/8] Clear preoperative and postoperative instructions are essential, yet some patients struggle to comprehend them, increasing their risk for complications. These risks can be mitigated through simplified materials, such as the Easy-to-Read library (Elsevier), the only known large-scale effort to offer professionally simplified surgical instructions through electronic medical records (EMRs). This study evaluates the hypothesis that artificial intelligence can simplify surgical instructions as effectively as Easy-to-Read versions, with respect to readability, fidelity, and safety metrics.
Jiang Y. et al. MedAgentBench: A Virtual EHR Environment to Benchmark Medical LLM Agents. NEJM AI 2025;2(9). DOI: 10.1056/AIdbp2500144
[2025/8] MedAgentBench encompasses 300 patient-specific clinically derived tasks from 10 categories written by human physicians, realistic profiles of 100 patients with over 700,000 data elements, a Fast Healthcare Interoperability Resources–compliant interactive environment, and an accompanying codebase. The environment uses standard application programming interfaces and communication infrastructure used in modern electronic health record (EHR) systems so that it can be easily migrated into live EHR systems.
~
In the Eyes of GPTs
(AJ in conversation with ChatGPT to understand how machines view human healthcare)
As LLMs like ChatGPT become more accurate in predicting diseases, direct interaction by physicians alone may not yield optimal results. It is clear that clinical AI should be treated as a new specialty—similar to pathology or radiology—where a complex technology provides critical insights for diagnosing and treating diseases. Do you consider Clinical GenAI a new specialty in medicine?
ChatGPT
At this point, most medical systems and professional societies do not officially recognize “clinical GenAI” as a specialty, but the analogy you raise is important.
👉 So: Not yet a specialty, but conceptually on a path that mirrors the emergence of radiology and pathology.
Good analysis. If Clinical GenAI evolves into a medical specialty, what new professional roles should be created in health care delivery?
ChatGPT
If Clinical GenAI evolves into a medical specialty, health systems would need to create new professional roles that parallel radiology/pathology (where specialized physicians and allied staff form an ecosystem). Likely roles include:
1. Clinical GenAI Specialist (Physician)
2. AI Diagnostic Interpreter / Consultant
3. Clinical GenAI Data Scientist
4. AI Quality & Safety Officer
5. GenAI Clinical Educator
6. AI Workflow Engineer
7. Patient-facing AI Navigator
8. GenAI Regulatory & Ethics Specialist
Analogy to Radiology/Pathology
Thank you for the comprehensive list of roles. Before Clinical GenAI is officially accepted as a specialty, it will likely be tested through different roles. A recent Nature perspective proposes an “algorithmic consultant” role for AI. I have designed a GenAI-enabled navigator role for monitoring and improving care quality, which won a prize in a global challenge competition. Currently, our collaborative clinical study is evaluating GenAI in the role of providing AI diagnostic reports to support physicians in making more complete diagnoses and treatment plans. What do you think of these roles at the current stage?
ChatGPT
I think your framing is exactly right: before “Clinical GenAI” can mature into a recognized specialty, the field will advance through role experimentation. The three roles you mention—algorithmic consultant, GenAI-enabled navigator, and AI diagnostic report provider—each test different aspects of feasibility, safety, and value.
1. Algorithmic Consultant (as proposed in Nature)
2. GenAI-enabled Navigator (your design)
3. AI Diagnostic Report Provider (your clinical study)
Synthesis: Why these roles matter now
~
🚀 Healthcare GenAI Toolbox
Empowering physicians to accelerate the responsible use of GenAI in healthcare through three growth stages: 🚀 1. Quick Start – Access GenAI agents preclinically validated with synthetic data to enhance routine clinical decision-making. 🛠️ 2. LLM Customization – Fine-tune LLMs with your own data within ELHS units to adapt GenAI to your clinical setting. 🌍 3. Broader Impact – Scale your ELHS units into research networks so more physicians can use your LLMs, collaboratively improving care quality and outcomes for larger populations.
|
|
|
Creating a positive future by making innovation available to all and enabling advancements that can enhance our lives and spur economic growth. Open-source AI provides tools for more people to build awesome things and expand our horizons. |
~
🔹 ELHS Institute Relevant Resources 🔹
▶️ ELHS Videos
👉 For Clinical AI technology support, contact us at support@elhsi.org 📩
~ the end ~
Democratizing GenAI and LHS to Advance Global Health Equity
info@elhsi.org
Palo Alto, California, USA