Newsletters

Your E-mail *

The Future of Healthcare with GenAI – Insights from Dr. Isaac Kohane

 
ELHS Newsletter 2025-03-04

Subscribe | All Issues

 

Dear Friends,

 

I anticipate fundamental changes in healthcare as generative AI (GenAI) becomes more democratized. But where will these changes take place? One way to find out is by examining expert perspectives from leading voices in the field. One of the most urgent global healthcare challenges is the severe shortage of healthcare professionals—an estimated shortfall of 10 million by 2030, according to the WHO. To explore how GenAI might help address this issue, let’s turn to Dr. Isaac Kohane, Professor and Chair of Biomedical Informatics at Harvard Medical School.

 

In a recent interview with NEJM AI, Dr. Kohane discussed how GenAI could empower physician assistants and nurse practitioners to help bridge the functional gap in primary care. In his NEJM perspective article, he urges clinical researchers to compare real-world AI-assisted outcomes to existing healthcare systems, rather than an idealized version of healthcare that doesn’t yet exist. Dr. Kohane has also proposed an Alignment Compliance Index—a framework for measuring how well AI models align with human values, though this concept is still under development.

 

Another key insight from his interview is the absence of a critical role in today’s healthcare system—a dedicated individual responsible for ensuring that patients receive necessary care and adhere to treatment plans. This idea echoes the Guardian Angel concept proposed by his advisor at MIT over 30 years ago, which envisioned AI assisting in personalized patient care.

 

I believe Dr. Kohane’s perspectives offer practical and actionable ways to leverage GenAI at this early stage of healthcare transformation. At ELHS Institute, we share a similar vision. Our approach focuses on fine-tuning open-source LLMs to accelerate clinical teams in evaluating GenAI for real-world patient care, ensuring AI applications are rigorously tested using real-world data from electronic health records.

 

AI Advancing Healthcare: February Highlights

In February, new evidence emerged on how AI is enhancing standards of care across screening, diagnosis, treatment, and clinical trials. Some key findings include:

  • LLMs improving physician performance in patient care tasks, assisting in radiological differential diagnosis, supporting gastroenterologists in diagnosing complex cases, predicting progression to schizophrenia or bipolar disorder, and estimating postoperative risks.

  • AI training inexperienced clinicians to perform expert-level lung ultrasound imaging.

  • LLMs outperforming professionals in pre-screening patients for clinical trials.

 

Explore the latest research papers below, and enjoy my conversation with ChatGPT about Dr. Kohane’s expert insights.

 

Warm regards,

AJ
AJ Chen, PhD | ELHS Institute | https://elhsi.org
 

~

 

 

Source: Dr. Isaac Kohane, Harvard Medical School

 

From Page Mill

 

Huo B, Boyle A, Marfo N, et al. Large Language Models for Chatbot Health Advice Studies: A Systematic Review. JAMA Netw Open. 2025;8(2):e2457879.

[2025/2] In this systematic review of 137 articles, 99.3% of the studies assessed closed-source models and did not provide enough information to identify the LLM. Most (64.5%) studies used subjective means as the ground truth to define the successful performance of the LLM, while less than a third addressed the ethical, regulatory, and patient safety implications of clinically integrating LLMs. The findings of this study suggest that the extent of reporting varies considerably among studies evaluating the clinical accuracy of LLMs providing health advice.

 

Kim, S.H., Schramm, S., Adams, L.C. et al. Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports. npj Digit. Med. 8, 97 (2025).

[2025/2] This study evaluated the diagnostic performance of fifteen open-source LLMs and one closed-source LLM (GPT-4o) in 1,933 cases from the Eurorad library. LLMs provided differential diagnoses based on clinical history and imaging findings. Responses were considered correct if the true diagnosis appeared in the top three suggestions. Models were further tested on 60 non-public brain MRI cases from a tertiary hospital to assess generalizability. In both datasets, GPT-4o demonstrated superior performance, closely followed by Llama-3-70B, revealing how open-source LLMs are rapidly closing the gap to proprietary models. Our findings highlight the potential of open-source LLMs as decision support tools for radiological differential diagnosis in challenging, real-world cases.

 

Goh, E., Gallo, R.J., Strong, E. et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat Med (2025).

[2025/2] From November 2023 to April 2024, 92 practicing physicians were randomized to use either GPT-4 plus conventional resources or conventional resources alone to answer five expert-developed clinical vignettes in a simulated setting. All cases were based on real, de-identified patient encounters, with information revealed sequentially to mirror the nature of clinical environments. The primary outcome was the difference in total score between groups on expert-developed scoring rubrics. Secondary outcomes included domain-specific scores and time spent per case. Physicians using the LLM scored significantly higher compared to those using conventional resources (mean difference = 6.5%, 95% confidence interval (CI) = 2.7 to 10.2, P < 0.001). LLM users spent more time per case (mean difference = 119.3 s, 95% CI = 17.4 to 221.2, P = 0.02). There was no significant difference between LLM-augmented physicians and LLM alone (−0.9%, 95% CI = −9.0 to 7.2, P = 0.8). LLM assistance can improve physician management reasoning in complex clinical vignettes compared to conventional resources and should be validated in real clinical practice. 

 

Hansen L, Bernstorff M, Enevoldsen K, et al. Predicting Diagnostic Progression to Schizophrenia or Bipolar Disorder via Machine Learning. JAMA Psychiatry. Published online February 19, 2025.

[2025/2] To investigate whether machine learning models trained on routine clinical data from electronic health records (EHRs) can predict diagnostic progression to schizophrenia or bipolar disorder among patients undergoing treatment in psychiatric services for other mental illness. Predicting schizophrenia separately yielded better performance (AUROC, 0.80; 95% CI, 0.79-0.81; sensitivity, 19.4%; specificity, 96.3%; PPV, 10.8%) than was the case for bipolar disorder (AUROC, 0.62, 95% CI, 0.61-0.63; sensitivity, 9.9%; specificity, 96.2%; PPV, 8.4%). Clinical notes proved particularly informative for prediction.

 

Horvat CM, Barda AJ, Perez Claudio E, et al. Interoperable Models for Identifying Critically Ill Children at Risk of Neurologic Morbidity. JAMA Netw Open. 2025;8(2):e2457469.

[2025/2] In this prognostic study, we developed well-performing models for predicting which children with critical illness were at risk for neurologic morbidity. A flexible, distributed strategy for model development in partnership with an external validation site demonstrated the utility of adapting to varied informatics infrastructures and EHR deployments to generate well-performing predictive models for a common clinical goal. A generalizable model demonstrated robust performance in external validation. Prospective, multisite assessment of a generalizable model coupled with brain injury–based biomarkers is warranted to assess the combined utility for identifying patients at high risk for incurred neurologic morbidity and evaluating interventions to improve outcomes in this population.

 

Alba, C., Xue, B., Abraham, J. et al. The foundational capabilities of large language models in predicting postoperative risks using clinical notes. npj Digit. Med. 8, 95 (2025).

[2025/2] Using 84,875 preoperative notes and its associated surgical cases from 2018 to 2021, we examine the performance of LLMs in predicting six postoperative risks using various fine-tuning strategies. Pretrained LLMs outperformed traditional word embeddings by an absolute AUROC of 38.3% and AUPRC of 33.2%. Self-supervised fine-tuning further improved performance by 3.2% and 1.5%. Incorporating labels into training further increased AUROC by 1.8% and AUPRC by 2%. The highest performance was achieved with a unified foundation model, with improvements of 3.6% for AUROC and 2.6% for AUPRC compared to self-supervision, highlighting the foundational capabilities of LLMs in predicting postoperative risks, which could be potentially beneficial when deployed for perioperative care.

 

Yang, X., Li, T., Wang, H. et al. Multiple large language models versus experienced physicians in diagnosing challenging cases with gastrointestinal symptoms. npj Digit. Med. 8, 85 (2025).

[2025/2] An offline dataset of 67 challenging cases with primary gastrointestinal symptoms was used to solicit possible diagnoses from seven LLMs and 22 gastroenterologists. The diagnoses by Claude 3.5 Sonnet covered the highest proportion (95% confidence interval [CI]) of instructive diagnoses (76.1%, [70.6%–80.9%]), significantly surpassing all the gastroenterologists (p < 0.05 for all). Claude 3.5 Sonnet achieved a significantly higher coverage rate (95% CI) than that of the gastroenterologists using search engines or other traditional resource (76.1% [70.6%–80.9%] vs. 45.5% [40.7%-50.4%], p < 0.001). The study highlights that advanced LLMs may assist gastroenterologists with instructive, time-saving, and cost-effective diagnostic scopes in challenging cases.

 

Baloescu C, Bailitz J, Cheema B, et al. Artificial Intelligence–Guided Lung Ultrasound by Nonexperts. JAMA Cardiol. Published online January 15, 2025.

[2025/1] Can artificial intelligence (AI) aid health care clinicians inexperienced in lung ultrasound (LUS) in obtaining high-quality LUS clips? In this multicenter diagnostic validation study among adults with shortness of breath, 98.3% of ultrasound examinations performed by trained health care professionals with AI guidance were of sufficient quality to meet diagnostic standards and were not statistically different compared with images acquired by LUS experts without AI guidance. With AI assistance, trained novices can produce expert-level images that can be used to assess pathology after a short training session, potentially enhancing access to LUS in resource-constrained settings.

 

Unlu O, Varugheese M, Shin J, et al. Manual vs AI-Assisted Prescreening for Trial Eligibility Using Large Language Models—A Randomized Clinical Trial. JAMA. Published online February 17, 2025. 

[2025/2] The eligibility rate was 20.4% (458/2242 patients) for the AI-assisted screening method vs 12.7% (284/2234 patients) for the manual screening method (P < .001). At the end of the trial, there were 35 enrollments (1.6%) using the AI-assisted screening method compared with 19 enrollments (0.9%) using the manual screening method (subdistribution hazard ratio, 1.79 [95% CI, 1.02-3.15], P = .04). The AI-assisted patient screening using the RECTIFIER tool significantly improved the rate of eligibility determination and enrollment compared with manual screening in a randomized clinical HF trial. These results provide clinical trials with technology that could transform study operations.

 

~

 

 

 

In the Eyes of GPTs

 

  What do you know about Dr. Isaac Kohane at Harvard Medical School and his expert views on healthcare AI?

 

  ChatGPT

​Dr. Isaac (Zak) Kohane is a prominent figure in biomedical informatics, currently serving as the inaugural Chair of the Department of Biomedical Informatics at Harvard Medical School and holding the Marion V. Nelson Professorship in Biomedical Informatics. Dr. Kohane's research focuses on applying computational techniques to address diseases at multiple scales, from healthcare systems to functional genomics, with a particular emphasis on neurodevelopmental disorders like autism. ​

In the realm of healthcare AI, Dr. Kohane has been a pioneering advocate for integrating artificial intelligence into medical practice. Dr. Kohane emphasizes the importance of rigorous evaluation of AI tools, advocating for their assessment through randomized clinical trials to ensure they meet the same standards as other clinical interventions. He also highlights the potential of AI to enhance doctor-patient interactions by automating routine tasks, thereby allowing physicians to focus more on patient care. ​

Furthermore, Dr. Kohane has proposed the development of an Alignment Compliance Index to measure how well AI models align with human values, underscoring the need for ethical considerations in AI deployment within healthcare.

 

 Why does Zak view medicine as a knowledge-processing discipline?

 

 ChatGPT

Dr. Kohane advocates for viewing medicine as a knowledge-processing discipline. He emphasizes that medical practice fundamentally involves processing vast amounts of information, including patient data, medical records, clinical experiences, and the latest medical literature. By adopting this perspective, healthcare can integrate advanced computational tools and artificial intelligence to enhance decision-making, improve patient outcomes, and accelerate medical research. Dr. Kohane believes that embracing this approach is essential for the evolution of healthcare in the 21st century.

 

 If medicine is primarily a knowledge-processing discipline, LLM-based GenAI may already be smarter than most primary care doctors. This is good news. What is Dr. Kohane's perspective on leveraging GenAI to address the global primary care crisis?

 

 ChatGPT

​He emphasizes that medicine is fundamentally a knowledge-processing discipline, suggesting that GenAI can assist in managing vast medical information and supporting clinical decision-making. Dr. Kohane advocates for the integration of AI into medical education to prepare future physicians for a healthcare landscape transformed by technology. He highlights that AI can alleviate administrative burdens, allowing clinicians to focus more on patient care. ​

However, he cautions that AI should complement, not replace, the human elements of medical practice, such as intuition and patient interaction. Dr. Kohane stresses the importance of rigorous evaluation of AI tools through randomized clinical trials to ensure their efficacy and safety in real-world settings.

In summary, Dr. Kohane envisions GenAI as a valuable tool to enhance primary care by supporting healthcare professionals, improving efficiencies, and enriching medical education, while maintaining the essential human touch in patient interactions.​

For a deeper insight into Dr. Kohane's perspectives on AI in medicine, watch this video for his presentation at ​Brown University.

 

 How does the Alignment Compliance Index work?

 

 ChatGPT

The Alignment Compliance Index is a proposed metric designed to evaluate how closely artificial intelligence (AI) systems adhere to human values and ethical standards. While specific methodologies for this index are still under development, the concept aligns with the broader field of AI alignment, which focuses on ensuring that AI systems operate in ways that are consistent with human intentions and societal norms. ​ The Alignment Compliance Index aims to provide a standardized measure of an AI system's conformity to desired ethical and functional parameters, thereby promoting safer and more reliable AI deployment.

 

~

 

GenAI/LHS Toolbox

 

 

ELHS GenAI Copilot Platform

The new copilot platform makes it easy for doctors and patients to use GenAI:

🚀 Fine-tune Llama 3.1-8B LLM for high-accuracy disease prediction, overcoming the bottlenecks that most clinical teams face in GenAI clinical studies and applications.
Use a free personal open-source LLM chatbot—My Copilot.
Access a free multi-LLM copilot, including GPT-4o, Gemini 1.5 Pro, and Llama 3.1.

 

Meta AI

Meta AI helps you learn, create and connect in new ways. With our latest features you can now talk with recognizable voices and share images with Meta AI to learn more about your surroundings. Ask Meta AI anything.

 

 

~

🔹 ELHS Institute Relevant Resources 🔹

🚀 ELHS GenAI Copilot Platform

🌐 ELHS Institute Website

📰 ELHS Newsletters

🎦 ELHS Videos

👉For LLM Fine-Tuning Services, Contact support@elhsi.org 📩

 

~ the end ~