Your Name *
Your E-mail *

Healthcare GenAI Agents are Coming


ELHS Newsletter 2024-04-29

Paving the Way for Global Health Equity with AI, ML, Data, and LHS (Learning Health Systems)

Subscribe | All Issues


Dear Friends,


Healthcare chatbots are potential key components of broader patient-facing AI agents. However, a 2022 review by Parmar et al. revealed that the previous generation of chatbots fell short of marketing claims regarding natural language interaction capabilities. Since ChatGPT-like chatbots powered by large language models (LLMs) can converse naturally with patients and care providers, demonstrating  apparently logical reasoning, the development of generative AI agents is now at the forefront of healthcare GenAI research.     


UCSF researchers commented in a commentary piece that we should evaluate LLMs as AI agents in the clinic beyond just language models.  Drawing from an analogy to autonomous vehicle development, Stanford University researchers proposed a pathway to autonomous behavioral healthcare in their paper.  As an example, GPT-agents based on medical guidelines enhanced the accuracy and empathy of responses to traumatic brain injury rehabilitation questions.

Does it sound like healthcare GenAI agents are on their way?


Enjoy reading the latest developments and my conversations with ChatGPT below. 


Warm regards,



AJ Chen, PhD
ELHS Institute




Healthcare GenAI Agent



From Page Mill

Published papers, recent news, and significant events in a coherent narrative for the main topic. 

Mehandru, N., Miao, B.Y., Almaraz, E.R. et al. Evaluating large language models as agents in the clinic. npj Digit. Med. 7, 84 (2024).

[2024/4] These LLMs are not just capable of modeling language, but can also act as intelligent “agents” that interact with stakeholders in open-ended conversations and even influence clinical decision-making. Rather than relying on benchmarks that measure a model’s ability to process clinical data or answer standardized test questions, LLM agents can be modeled in high-fidelity simulations of clinical settings and should be assessed for their impact on clinical workflows. These evaluation frameworks, which we refer to as “Artificial Intelligence Structured Clinical Examinations” (“AI-SCE”), can draw from comparable technologies where machines operate with varying degrees of self-governance, such as self-driving cars, in dynamic environments with multiple stakeholders. Developing these robust, real-world clinical evaluations will be crucial towards deploying LLM agents in medical settings.


Stade, E.C., Stirman, S.W., Ungar, L.H. et al. Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation. npj Mental Health Res 3, 12 (2024).

[2024/4] This paper provides a roadmap for the ambitious yet responsible application of clinical LLMs in psychotherapy. First, a technical overview of clinical LLMs is presented. Second, the stages of integration of LLMs into psychotherapy are discussed while highlighting parallels to the development of autonomous vehicle technology. Third, potential applications of LLMs in clinical care, training, and research are discussed, highlighting areas of risk given the complex nature of psychotherapy. Fourth, recommendations for the responsible development and evaluation of clinical LLMs are provided, which include centering clinical science, involving robust interdisciplinary collaboration, and attending to issues like assessment, risk detection, transparency, and bias. Lastly, a vision is outlined for how LLMs might enable a new generation of studies of evidence-based interventions at scale, and how these studies may challenge assumptions about psychotherapy.


Zhenzhu, L., Jingfeng, Z., Wei, Z. et al. GPT-agents based on medical guidelines can improve the responsiveness and explainability of outcomes for traumatic brain injury rehabilitation. Sci Rep 14, 7626 (2024).

[2024/4] In this study, medical guidelines and agents based on GPT-4 were used to answer questions related to TBI rehabilitation. This system automatically evaluates the correctness of the answers, simultaneously providing relevant content from the medical guidelines to enhance explainability. The evaluation revealed that the responses generated by the guideline-based GPT-agents performed better in terms of accuracy, explainability, and empathy than those obtained by directly querying GPT-4.  


Parmar, P., Ryu, J., Pandya, S. et al. Health-focused conversational agents in person-centered care: a review of apps. npj Digit. Med. 5, 21 (2022).

[2022] Our assessment indicated that only a few apps use machine learning and natural language processing approaches, despite such marketing claims. Most apps allowed for a finite-state input, where the dialogue is led by the system and follows a predetermined algorithm. Healthbots are potentially transformative in centering care around the user; however, they are in a nascent state of development and require further research on development, automation and adoption for a population-level health impact.


Abbasian, M., Khatibi, E., Azimi, I. et al. Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI. npj Digit. Med. 7, 82 (2024).

[2024/4] Existing evaluation metrics proposed for various generic large language models (LLMs) demonstrate a lack of comprehension regarding medical and health concepts and their significance in promoting patients’ well-being. The purpose of this paper is to explore state-of-the-art LLM-based evaluation metrics that are specifically applicable to the assessment of interactive conversational models in healthcare. Subsequently, we present a comprehensive set of evaluation metrics designed to thoroughly assess the performance of healthcare chatbots from an end-user perspective. These metrics encompass an evaluation of language processing abilities, impact on real-world clinical tasks, and effectiveness in user-interactive conversations. Finally, we engage in a discussion concerning the challenges associated with defining and implementing these metrics, with particular emphasis on confounding factors such as the target audience, evaluation methods, and prompt techniques involved in the evaluation process.


Katz U, et al. GPT versus Resident Physicians — A Benchmark Based on Official Board Scores. NEJM AI 2024;1(5). DOI: 10.1056/AIdbp2300192.

[2024/4] We evaluated the performance of generative pretrained transformer 3.5 (GPT-3.5) and GPT-4 on the 2022 Israeli board residency examinations and compared the results with those of 849 practicing physicians. GPT-4 passed the board residency examination in four of five specialties, revealing a median score higher than the official passing score of 65%. Given the maturity of this rapidly improving technology, the adoption of LLMs in clinical medical practice is imminent.


Rydzewski NR, et al. Comparative Evaluation of LLMs in Clinical Oncology. NEJM AI 2024;1(5).

DOI: 10.1056/AIoa2300151.

[2024/4] This study tested five publicly available LLMs on a comprehensive battery of 2044 oncology questions. GPT-4 was the only model to perform above the 50th percentile.  Given the enthusiasm to integrate these new implementations of AI into clinical practice, continued standardized evaluations of the strengths and limitations of these products will be critical to guide both patients and medical professionals.


Mao D, et al. AI-MARRVEL — A Knowledge-Driven AI System for Diagnosing Mendelian Disorders. NEJM AI 2024;1(5). DOI: 10.1056/AIoa2300009

[2024/4] AIM improved the rate of accurate genetic diagnosis, doubling the number of solved cases as compared with benchmarked methods, across three distinct real-world cohorts. AIM achieved superior accuracy compared with existing methods for genetic diagnosis. We anticipate that this tool may aid in primary diagnosis, reanalysis of unsolved cases, and the discovery of novel disease genes. 


Feldman J, et al. Scaling Note Quality Assessment Across an Academic Medical Center with AI and GPT-4. NEJM Catal Innov Care Deliv 2024;5(5). DOI: 10.1056/CAT.23.0283.

[2024/4] This case study explores how NYU Langone Health leveraged artificial intelligence (AI) to address the challenge to improve the content and quality of medical documentation. By quickly and accurately analyzing large volumes of clinical documentation and providing feedback to organizational leadership and individually to providers, AI can help support a culture of continuous note quality improvement, allowing organizations to enhance a critical component of patient care.







GPTs are becoming an integral part of our society. AJ talks with GenAI to understand how machines view the main topics.

 A recent hot topic is the development of generative AI agents. What does the term “AI agent” mean?



The term "AI agent" refers to a system or software that can operate autonomously or semi-autonomously to perform tasks or achieve goals in a given environment. These agents use Artificial Intelligence to analyze their surroundings, make decisions, and execute actions based on their programming and learning. Here are some key features of AI agents:

  • Autonomy: AI agents can operate without human intervention to a significant extent. They make decisions based on the data they perceive and their pre-programmed goals.

  • Reactivity: They perceive their environment and respond to changes that occur in it in a timely fashion.

  • Proactivity: Beyond responding to conditions and changes, AI agents can take initiative, exhibiting goal-directed behavior to fulfill their designated objectives.

  • Social ability: AI agents can interact with other agents (including humans) to complete complex tasks or negotiate solutions to problems.

In the context of generative AI, these agents can create or generate new content, solutions, or data based on learned patterns and training data. This includes things like text (as in conversational AI models), images, music, code, or any other form of digital output.


 There is much research focused on developing AI agents for healthcare. Please summarize the current status of healthcare AI agents’ R&D.



Research and development of AI agents in healthcare continue to advance rapidly, focusing on a variety of applications aimed at improving patient outcomes, streamlining healthcare operations, and addressing workforce challenges.

  • Generative AI is a key area of innovation, particularly in creating synthetic data for training healthcare AI without compromising patient privacy. This has potential applications in improving the accuracy of future diagnoses and enhancing patient satisfaction​ (SCA Health | Insights)​.

  • Clinical Decision-Support Tools (CDS) are also seeing progress. These tools help in diagnosing and treating patients by analyzing large datasets to provide recommendations. Their development is being driven by evidence from clinical trials, highlighting a shift towards more evidence-based AI applications in healthcare​ (McKinsey & Company)​.

  • AI in Routine Healthcare Tasks: There's growing implementation of AI for routine tasks such as medical scribing, patient triage, and scheduling. These applications are especially significant in addressing workforce shortages and reducing the workload on healthcare professionals by automating administrative tasks​ (McKinsey & Company)​​ (RockHealth)​.

  • Challenges and Barriers: Despite these advancements, the integration of AI in healthcare faces several challenges, including ethical concerns, data privacy issues, and the need for rigorous regulation. Moreover, there is a gap between the development of AI technologies and their practical implementation in clinical settings, which needs to be bridged for these technologies to have a significant impact​ (BioMed Central)​.

  • Institutional Initiatives: Leading institutions like UCSF are spearheading the integration of AI in healthcare, leveraging their extensive patient databases and collaborations across academia and industry to drive forward AI research and applications​ (Home)​.

Overall, the future of healthcare AI looks promising with continuous advancements in technology and an increasing focus on creating practical, safe, and effective AI-driven solutions. However, the full potential of AI in healthcare will depend on overcoming the regulatory, ethical, and operational challenges currently facing the field.


 Tell me about all the research you know of that uses Generative AI, like ChatGPT, for developing healthcare AI agents.



Generative AI, similar to models like ChatGPT, is increasingly being explored for its potential to revolutionize various aspects of healthcare. Here are some key areas of research involving generative AI in healthcare:

  • Synthetic Data Generation: Generative AI is being used to create realistic, anonymized patient data that can be used for training other AI systems without risking patient privacy. This synthetic data helps in developing algorithms that can predict patient outcomes, personalize treatment plans, and enhance clinical decision-making processes.

  • Automating Administrative Tasks: AI models are employed to automate routine tasks such as generating medical documentation, summarizing patient interactions, and managing billing processes. This reduces the administrative burden on healthcare professionals and allows them to focus more on patient care.

  • Drug Discovery and Development: Generative AI is utilized in the pharmaceutical industry to accelerate the drug discovery process. It can generate novel molecular structures that could potentially be developed into new drugs, thus speeding up early-stage drug development and reducing costs.

  • Personalized Medicine: By analyzing vast amounts of medical data, generative AI can help in developing personalized treatment plans tailored to individual patients’ genetic profiles and health conditions. This approach aims to improve treatment efficacy and reduce side effects.

  • Medical Imaging: AI models are trained to interpret medical images such as X-rays, MRIs, and CT scans. They can assist in diagnosing conditions more quickly and accurately than traditional methods, often detecting subtle features that may be overlooked by human eyes.

  • Patient Interaction: AI-driven chatbots and virtual health assistants can provide patients with 24/7 support, answering general health queries, providing medication reminders, and helping with appointment scheduling. These agents are designed to understand natural language and interact in a conversational manner.

  • Ethical and Regulatory Considerations: Research is also focused on addressing the ethical implications of using AI in healthcare, including issues of bias, transparency, and accountability in AI decisions, as well as ensuring compliance with healthcare regulations.

Each of these research areas contributes to the broader goal of improving healthcare delivery and patient outcomes through advanced AI technologies. The development of these systems requires ongoing collaboration between AI researchers, clinicians, and regulatory bodies to ensure they are safe, effective, and equitable.


 Do you know of any disease areas where GenAI agents can have an immediate impact?



Generative AI (GenAI) agents have the potential to make immediate impacts in several disease areas by enhancing diagnosis, treatment, and patient management. Here are some key areas where GenAI could be particularly effective:

  • Oncology: GenAI can help in the early detection and diagnosis of cancer through improved imaging analysis and by identifying patterns in genetic mutations associated with cancer risks. Additionally, it can assist in personalized treatment planning by predicting how different cancers may respond to various therapies.

  • Cardiology: AI can analyze electrocardiograms (ECGs) and other imaging data to detect heart diseases such as atrial fibrillation earlier and with greater accuracy than traditional methods. It can also help in managing chronic conditions like heart failure by monitoring patient data and predicting adverse events.

  • Neurology: In diseases like Alzheimer’s and Parkinson’s, GenAI can be used to detect early signs of cognitive decline and motor dysfunction. It can analyze speech and movement patterns to aid in early diagnosis and to monitor disease progression.

  • Diabetes Management: AI agents can help in managing diabetes by predicting blood sugar levels based on diet, medication, activity, and other personal data inputs. This can aid in personalized diabetes care, improving patient adherence to treatment and preventing complications.

  • Infectious Diseases: During outbreaks, AI models can predict infection spread patterns and identify potential hotspots, aiding in public health responses. They can also assist in developing treatment strategies by analyzing the effectiveness of various interventions.

  • Mental Health: AI can support mental health by providing cognitive behavioral therapy through chatbots, monitoring patient mood and behavior patterns, and alerting healthcare providers when intervention may be needed to prevent a crisis.

In all these areas, the use of GenAI can lead to more accurate diagnoses, tailored treatments, and proactive healthcare management, potentially transforming patient outcomes in these critical disease areas.



~ the end ~