ELHS Institute

Newsletters

Your E-mail *

The Rise of GenAI-Augmented Generalists

ELHS Newsletter 2024-03-29

Paving the Way for Global Health Equity with AI, ML, Data, and LHS (Learning Health Systems)

Dear Friends,

Our journey through the landscape of Artificial Intelligence (AI) in healthcare has encountered its share of challenges. Previous AI generations, which depended heavily on structured data, showed limited impact in practical healthcare applications. Notably, initiatives like IBM Watson AI struggled to effectively assist in diagnostics and personalizing treatments in real-world settings. Similarly, AI-based symptom checkers have not become the widely accepted initial triage resource we hoped for.

Despite these setbacks, the potential of AI in revolutionizing healthcare remains undeniable. The question that now arises is whether Generative AI (GenAI) can rekindle the aspirations we've held for AI's role in healthcare.

The latest developments in AI, particularly those in generative AI and large language models (LLMs) capable of processing unstructured data and learning from a multitude of information sources, signal a promising horizon. These advancements could potentially mitigate the issues faced by earlier AI implementations. Nonetheless, the experiences garnered from past endeavors highlight the critical need for a methodical and evidence-backed method for integrating AI technologies into clinical settings.

Encouragingly, initial clinical evaluations of GenAI platforms, including ChatGPT, have demonstrated promising outcomes. This issue of our newsletter sheds light on recent studies that have benchmarked ChatGPT in various capacities, such as decision support, drug regimen predictions, and clinical text summarization.

The fulfillment of the AI-augmented generalist's vision in healthcare is anticipated to stem from the synergistic integration of both healthcare-specific LLMs, like Google's Med-PaLM, and general-purpose LLMs, such as OpenAI's ChatGPT. Each variant brings its own set of strengths and challenges to the table.

We stand on the brink of a new era, poised to witness the transformative impact of GenAI in healthcare. The road ahead is filled with potential and promise, beckoning us to continue exploring the vast capabilities of AI in improving patient care and outcomes.

Now, we have an even better chance of witnessing the rise of GenAI-Augmented Generalists sooner rather than later.

Enjoy reading the latest developments and my conversations with ChatGPT below.

Warm regards,

AJ Chen, PhD
ELHS Institute
web: elhsi.org
email: aj@elhsi.org

GenAI-Augmented Generalist

From Page Mill

^{Published papers, recent news, and significant events in a coherent narrative for the main topic.}

Tu T, et al. Towards generalist biomedical ai. NEJM AI. 2024 Feb 22;1(3):AIoa2300138.

[2024/2] Med-PaLM M reached performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. In a side-by-side ranking on 246 retrospective chest x-rays, clinicians expressed a pairwise preference for Med-PaLM Multimodal reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility.

Sarkar U, Bates DW. Using Artificial Intelligence to Improve Primary Care for Patients and Clinicians. JAMA Intern Med. Published online February 12, 2024. doi:10.1001/jamainternmed.2023.7965

[2024/2] Primary care is desired by patients and is central to delivering high-quality health care, but its challenges have led to rampant burnout and a worsening shortage of primary care clinicians. One recent study determined that providing primary care for a panel of 2500 primary care patients would require 27 hours per day, 7 days per week. Electronic health records (EHRs) appear to be part of the issue, as primary care clinicians today are spending a large part of their day interacting with them. However, artificial intelligence (AI) promises to offer many ways to improve primary care processes. While the potential to enhance the quality and safety of health care in general and of primary care specifically using AI has been discussed for over 2 decades, the technological leap to widespread availability of generative AI suggests that use will soon affect the practice of primary care on a daily, if not an hourly, basis.

Sandmann S, et al. Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks. Nature Communications. 2024;15(1):2050.

[2024/3] We evaluate clinical accuracy of GPT-3·5 and GPT-4 for suggesting initial diagnosis, examination steps and treatment of 110 medical cases across diverse clinical disciplines. Moreover, two model configurations of the Llama 2 open source LLMs are assessed in a sub-study. Overall, GPT-4 performed best with superior performances over GPT-3·5 considering diagnosis and examination and superior performance over Google for diagnosis. Except for treatment, better performance on frequent vs rare diseases is evident for all three approaches. The sub-study indicates slightly lower performances for Llama models.

Li T, et al. CancerGPT for few shot drug pair synergy prediction using large pretrained language models. npj Digit. Med. 2024;7:40. doi:10.1038/s41746-024-01024-9.

[2024/2] Here we report our proposed few-shot learning approach, which uses LLMs to predict the synergy of drug pairs in rare tissues that lack structured data and features. Our experiments, which involved seven rare tissues from different cancer types, demonstrate that the LLM-based prediction model achieves significant accuracy with very few or zero samples. Our proposed model, the CancerGPT (with ~ 124M parameters), is comparable to the larger fine-tuned GPT-3 model (with ~ 175B parameters). Our research contributes to tackling drug pair synergy prediction in rare tissues with limited data, and also advancing the use of LLMs for biological and medical inference tasks.

Van Veen D, et al. Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts. Nat Med. 2024 Feb 27. doi:10.1038/s41591-024-02855-5.

[2024/2] Here we applied adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes and doctor–patient dialogue. Quantitative assessments with syntactic, semantic and conceptual NLP metrics reveal trade-offs between models and adaptation methods. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care.

In the Eyes of GPTs

^{GPTs are becoming an integral part of our society. AJ talks with GenAI to understand how machines view the main topics.}

What does a medical generalist do? Why do generalists play important roles in healthcare?

ChatGPT

A medical generalist typically refers to a healthcare professional who is broadly trained to provide a wide range of medical care rather than specializing in a narrow field. This term is often synonymous with primary care physicians (PCPs) or general practitioners (GPs). Here are the key responsibilities and roles of a medical generalist:

Responsibilities and Roles

First Point of Contact: Medical generalists are often the first healthcare professionals patients consult for health issues. They provide an entry point into the healthcare system.
Broad Range of Care: They diagnose and treat a wide variety of acute and chronic conditions, from common illnesses and injuries to managing long-term diseases.
Preventive Care: Generalists play a crucial role in preventive care, offering health education, vaccinations, and screening for early detection of diseases.
Continuity of Care: They provide ongoing care for patients over the long term, developing a comprehensive understanding of their health history and needs.
Coordination of Care: When specialized care is necessary, generalists coordinate with specialists to ensure that the patient receives comprehensive and cohesive treatment.
Holistic Approach: They consider the physical, emotional, and social aspects of a patient's health, offering care that addresses the full spectrum of health needs.
Community Health: Generalists often have a strong understanding of the health trends and needs within their local communities, allowing them to tailor their care approaches accordingly.

Importance in Healthcare

Accessibility: Generalists are more widely available than specialists, making healthcare more accessible to the general population.
Efficiency: By serving as the primary point of care, they help reduce the burden on specialist services and hospitals, making the healthcare system more efficient.
Cost-effectiveness: Primary care, led by generalists, is associated with lower healthcare costs, as it emphasizes preventive care and reduces the need for more expensive specialist and hospital care.
Patient Outcomes: Continuity of care with a generalist has been linked to better health outcomes, including lower mortality rates, as it ensures ongoing monitoring and management of health conditions.
Public Health: They play a vital role in public health initiatives, including vaccination campaigns and health education, contributing to the overall health of the population.

In summary, medical generalists are integral to the healthcare system, providing comprehensive, accessible, and coordinated care that meets the majority of patients' health needs. Their role in preventive care, continuity of care, and coordination with specialists ensures that patients receive holistic and efficient healthcare, ultimately improving individual and public health outcomes.

Is it a significant challenge to ensure enough PCPs are available for everyone in the world? Why?

ChatGPT

Yes, it's a significant challenge to have enough primary care physicians (PCPs) for everyone due to factors like uneven distribution of doctors, rising healthcare demands of aging populations, and insufficient numbers of medical students entering primary care. Additionally, rural and underserved areas often face greater shortages, exacerbating healthcare accessibility and equity issues globally.

How can generative AI help address the shortage of PCPs in the world?

ChatGPT

Generative AI can help address the shortage of primary care physicians (PCPs) in several innovative ways:

Automated Diagnostics: AI can analyze medical data, such as imaging and lab results, to help diagnose diseases, potentially reducing the workload on PCPs.
Virtual Health Assistants: AI-driven chatbots can provide preliminary consultations, health information, and self-care recommendations, easing the demand for direct PCP consultations for minor health concerns.
Personalized Treatment Plans: By analyzing vast amounts of health data, AI can assist in creating personalized treatment plans, improving the efficiency of patient care.
Education and Training: AI can support the training of medical students and professionals, making education more accessible and potentially encouraging more students to enter primary care fields.
Streamlining Administrative Tasks: AI can automate administrative tasks like scheduling, billing, and patient record management, allowing PCPs to focus more on patient care.
Remote Monitoring and Telehealth: AI can enhance remote monitoring of chronic conditions through devices that collect and analyze patient data, providing insights to PCPs without the need for in-person visits.

By augmenting the capabilities of PCPs and making healthcare delivery more efficient, generative AI has the potential to significantly mitigate the impact of the global shortage of primary care providers.

Are there any GenAI tools or LLMs capable of assisting PCPs in providing a high standard of care, including diagnosis and treatment?

ChatGPT

Yes, there are several generative AI tools and large language models (LLMs) capable of assisting primary care physicians (PCPs) to provide a high standard of care, including diagnosis and treatment. Some notable examples and areas of application include:

Diagnostic Assistance: AI tools like IBM Watson Health have been used to assist in diagnosing complex conditions by analyzing vast amounts of medical literature and patient data. Similarly, Google's DeepMind has made strides in diagnosing eye diseases and cancer more accurately.
Treatment Recommendations: AI systems can analyze data from numerous sources to suggest personalized treatment plans. This can be particularly useful for managing chronic conditions like diabetes, where AI algorithms can help in monitoring patient data and recommending insulin doses.
Natural Language Processing (NLP) for Patient Records: LLMs are increasingly used to interpret and organize unstructured patient data, such as clinical notes, into structured formats, aiding in patient management and research.
Virtual Health Assistants: Tools like chatbots powered by AI can perform initial patient assessments, collect patient history, and provide health information. This can help triage patients and ensure PCPs focus on more critical cases.
Mental Health Support: AI-powered platforms offer cognitive behavioral therapy techniques to support patients with anxiety and depression, supplementing the care provided by healthcare professionals.
Continuing Education for PCPs: AI-driven educational platforms can offer personalized learning experiences for PCPs, keeping them updated on the latest research, treatment modalities, and clinical guidelines.

While these AI tools and LLMs offer significant potential to enhance the quality and efficiency of care provided by PCPs, it's important to note that they are designed to assist, not replace, human healthcare providers. The use of such technologies in healthcare also raises important considerations regarding privacy, data security, and the need for thorough validation to ensure accuracy and reliability.

I believe the previous generation of AI, which required structured data, has made little impact in this direction. For example, IBM Watson AI struggled to assist in diagnosing diseases and personalizing treatments in real-world clinical settings. Similarly, other AI-based symptom checkers did not become widely accepted as a first-line triage resource for patients. Do you agree?

ChatGPT

Yes, I agree. While the potential of AI in healthcare has been widely recognized, the practical application and impact of earlier generations of AI, including systems like IBM Watson, have faced significant challenges. These include:

Data Complexity and Quality: Healthcare data is often unstructured and complex, making it difficult for AI systems that rely on structured data to accurately interpret and analyze it without sophisticated preprocessing.
Integration into Clinical Workflows: AI tools have struggled to seamlessly integrate into existing clinical workflows, which is crucial for their adoption by healthcare professionals.
Accuracy and Reliability: The accuracy of AI-powered diagnostic tools and symptom checkers has been variable, leading to skepticism among healthcare providers about relying on these tools for critical decisions.
Regulatory and Ethical Concerns: Regulatory hurdles and ethical concerns about patient privacy, data security, and the accountability of AI decisions have also limited the adoption of these technologies.
Expectations vs. Reality: Overhyped expectations have sometimes overshadowed the real-world capabilities and limitations of these AI systems, leading to disappointment when they fail to deliver as promised.

The evolution of AI, particularly with advancements in generative AI and large language models that can better handle unstructured data and learn from diverse sources of information, holds promise for overcoming some of these challenges. However, the lessons learned from the limitations and setbacks of earlier AI applications in healthcare underscore the importance of a careful, evidence-based approach to integrating AI technologies into clinical practice.

The initial clinical evaluation studies for GenAI, like ChatGPT, have so far shown encouraging signs. However, it is absolutely necessary to systematically demonstrate which clinical tasks in real clinical settings can benefit from using GenAI tools as a new information source. Without large-scale evidence become available, the rise of AI-augmented generalists remains a dream. Technically, will realizing this dream mostly depend on healthcare-specialized LLMs (like Google’s Med-PaLM) or general-purpose LLMs (like OpenAI’s ChatGPT)?

ChatGPT

Realizing the dream of AI-augmented generalists in healthcare will likely depend on a combination of both healthcare-specialized LLMs (like Google's Med-PaLM) and general-purpose LLMs (like OpenAI's ChatGPT), each offering unique advantages and facing distinct challenges:

Healthcare-Specialized LLMs

Tailored Expertise: Specialized LLMs are trained on vast amounts of medical literature, clinical guidelines, and patient data, making them better equipped to handle specific medical queries with higher accuracy.
Regulatory Compliance: They can be more easily designed to comply with healthcare regulations such as HIPAA in the US, ensuring patient data privacy and security.
Clinical Integration: These models may find easier integration into clinical workflows and electronic health records (EHR) systems, offering seamless support for diagnosis, treatment planning, and patient monitoring.

Challenges:

Data Bias and Quality: Ensuring the quality and representativeness of training data is critical to avoid bias and inaccuracies in medical recommendations.
Continuous Updating: Medical knowledge and guidelines evolve rapidly, requiring ongoing updates to the model's knowledge base.

General-Purpose LLMs

Flexibility and Adaptability: General LLMs can process a wide range of queries beyond strictly medical ones, which is useful in addressing the holistic needs of patients, including mental health, lifestyle advice, and more.
Accessibility: They are more widely available and can be easily accessed by healthcare professionals and patients alike, making them valuable tools for education and general health inquiries.

Challenges:

Specificity and Accuracy: While very versatile, general-purpose LLMs might lack the depth of medical knowledge and specificity required for accurate clinical decision-making.
Ethical and Legal Concerns: The use of general-purpose LLMs in healthcare raises questions about responsibility, accountability, and privacy that must be carefully navigated.

Conclusion

The integration of both types of LLMs into healthcare could offer the best path forward. Specialized LLMs can provide the depth of knowledge needed for clinical tasks, while general-purpose LLMs offer breadth and flexibility for broader healthcare applications. Collaboratively, they could enhance the capabilities of PCPs, offering comprehensive support that spans from diagnostic assistance to holistic patient care.

Achieving widespread acceptance and integration of AI in healthcare will require rigorous validation through clinical trials and real-world studies to build a robust evidence base. This includes demonstrating the safety, efficacy, and reliability of AI tools in diverse clinical settings and patient populations. Collaboration between AI developers, healthcare professionals, regulatory bodies, and patients will be essential to navigate the ethical, legal, and social implications of AI in healthcare.

It makes sense that both types of LLMs might need to work together to achieve the desired prediction accuracy. Therefore, the key is to focus on measuring clinical outcomes. If general LLMs can improve outcomes for some tasks, they will be the good choice. If specialized LLMs can outperform in some other tasks, then they are preferable for those. It seems there is a need for new benchmarking systems that can help ensure the development and deployment of generalist AI. Do you have any suggestions for such benchmarking systems?

ChatGPT

Designing benchmarking systems for evaluating the efficacy and safety of AI applications in healthcare, particularly for both generalist and specialist AI models, requires a multidimensional approach that accounts for clinical outcomes, user interaction, and ethical considerations. Here are some suggestions for such benchmarking systems:

1. Clinical Outcome Benchmarks

Accuracy and Reliability: Measure the accuracy of diagnoses, treatment recommendations, and prognostic predictions against established medical standards and outcomes.
Patient Safety: Assess the impact of AI recommendations on patient safety, including monitoring for adverse outcomes directly attributable to AI advice.
Effectiveness: Evaluate the real-world effectiveness of AI interventions in improving patient health outcomes compared to standard care.

2. Usability and Integration Benchmarks

User Experience (UX): Assess the usability of AI tools by healthcare professionals and patients, including ease of use, satisfaction, and the tool's integration into clinical workflows.
Interoperability: Evaluate the ability of AI systems to integrate with existing healthcare IT ecosystems, such as electronic health records (EHR) systems, without disrupting workflows.

3. Ethical and Regulatory Compliance Benchmarks

Privacy and Security: Measure compliance with healthcare data regulations (e.g., HIPAA in the US, GDPR in the EU) and the ability to safeguard patient information.
Bias and Fairness: Evaluate AI systems for biases in treatment recommendations and outcomes across different demographics, ensuring equitable healthcare delivery.
Transparency and Explainability: Assess the ability of AI systems to provide explanations for their recommendations in understandable terms, allowing for clinician oversight.

4. Performance and Scalability Benchmarks

Response Time: Measure the speed at which AI systems can deliver recommendations, crucial for their feasibility in fast-paced clinical settings.
Scalability: Assess the capability of AI systems to handle increasing volumes of queries and data without degradation in performance.

5. Continuous Learning and Improvement Benchmarks

Adaptability: Evaluate the ability of AI systems to learn from new data and adapt to evolving medical knowledge and clinical guidelines.
Longitudinal Performance: Monitor the performance of AI systems over time to ensure they maintain accuracy and reliability as they are exposed to new data and clinical scenarios.

Implementing the Benchmarking Systems

The implementation of these benchmarking systems could involve:

Collaboration between Stakeholders: Engage healthcare professionals, AI developers, patients, and regulatory bodies in the development of benchmarks to ensure they are comprehensive and relevant.
Standardized Testing Environments: Develop standardized datasets and scenarios for testing AI systems, facilitating comparison across different models and applications.
Real-world Monitoring: In addition to controlled evaluations, monitor the performance of AI systems in real-world clinical settings to capture their impact on a variety of clinical outcomes and settings.

Developing and deploying such comprehensive benchmarking systems will be crucial for ensuring that AI tools in healthcare are safe, effective, and contribute positively to patient care and outcomes.

Great suggestions. Thanks.

~ the end ~

View the newsletter online