Newsletters

Your E-mail *

Synthetic patient data – key to accelerating responsible LLMs in healthcare

 

ELHS Newsletter 2025-01-02

Subscribe | All Issues

 

(https://elhsi.com/myllm/finetuned)

 

 

Dear Friends,

The NAM leaders have reviewed the progress of Learning Health Systems (LHS) and provided a trust framework for their future development. More importantly, the perspectives recognize that generative AI will transform healthcare in unprecedented ways. I believe that ML-enabled LHS will fulfill the promises envisioned by LHS. In 2022, we published the first ML-enabled LHS unit using synthetic patient data, initiating a subfield of simulated ML-LHS. Today, generative AI (GenAI) has significantly simplified the creation of ML-LHS units.

 

As large language models (LLMs) consume all publicly available data, they may soon exhaust the existing supply. According to a recent Nature news article, estimates suggest that by 2028, developers will be working with datasets equivalent to the total text available on the Internet. When that point is reached, synthetic data will likely fill the gap. In fact, synthetic data is already being used to train LLMs.

 

After successfully developing ML models from real EHR data using the same pipeline we created for synthetic patient data, I hypothesized that synthetic patient data would accelerate ML/AI development in the future. This hypothesis has now been validated: synthetic patient data is proving to be a key enabler for fine-tuning open-source LLMs like Llama for healthcare applications.

 

For instance, clinical teams are currently facing a significant bottleneck: the lack of preclinically validated LLMs for use in clinical studies. This challenge prevents doctors from initiating LLM clinical research. To address this, we recently demonstrated that synthetic patient data can be used to fine-tune open-source Llama3.1-8B models, improving diagnostic prediction and providing preclinical validation data for disease prediction. With these pre-validated, fine-tuned LLMs, any doctor can now explore the benefits of GenAI in clinical care.

 

Our breakthrough in creating high-accuracy, fine-tuned LLMs using synthetic data will undoubtedly accelerate the democratization of GenAI in healthcare—a core mission of the ELHS Institute.

 

Keep reading to explore the latest papers below, and enjoy my conversations with ChatGPT about using synthetic data to accelerate GenAI development.

 

Warm regards,

AJ

AJ Chen, PhD | ELHS Institute | https://elhsi.org
 

~

 

 

From Page Mill

 

McGinnis, J. M., H. V. Fineberg, and V. J. Dzau. Shared commitments for health and health care: A trust framework from the learning health system. NAM Perspectives. Commentary, National Academy of Medicine, Washington, DC. https://doi.org/10.31478/202412c.

[2024] The potential for practical application of generative AI in health and health care sets the stage for accelerated real-time learning, evidence generation, and expedited application for continuous improvement in outcomes. This perspectives presents the shared commitments as a trust framework for health and health care services, clearly expressing the primacy of priorities for all health and health care stakeholders, hence the service as drivers of organizational culture and synergy.

-

Hartman V, Zhang X, Poddar R, et al. Developing and Evaluating Large Language Model–Generated Emergency Medicine Handoff Notes. JAMA Netw Open. 2024;7(12):e2448723. doi:10.1001/jamanetworkopen.2024.48723

[2024/12] Can a large language model (LLM) generate emergency medicine (EM)-to-inpatient (IP) handoff notes that are useful and safe for EM care? In this cohort study of 1600 EM patient medical records using a novel evaluation framework, the LLM-generated EM-to-IP handoff notes had a mean usefulness of 4.04 out of 5 (compared with 4.36 for physician-written) and a mean patient safety of 4.06 out of 5 (compared with 4.50 for physician-written) with no critical patient safety risks.  These findings suggest the value of a manual, patient safety–focused clinical evaluation of LLM models and the potential of LLM-generated handoff notes to create a new standard of care in EM.

-

Klang E, Apakama D, Abbott EE, et al. A strategy for cost-effective large language model use at health system-scale. NPJ Digit Med. 2024 Nov 18;7(1):320. doi: 10.1038/s41746-024-01315-1.

[2024/11] We assessed ten LLMs of different capacities and sizes utilizing real-world patient data. We conducted >300,000 experiments of various task sizes and configurations, measuring accuracy in question-answering and the ability to properly format outputs. Performance deteriorated as the number of questions and notes increased. High-capacity models, like Llama-3-70b, had low failure rates and high accuracies. GPT-4-turbo-128k was similarly resilient across task burdens, but performance deteriorated after 50 tasks at large prompt sizes. After addressing mitigable failures, these two models can concatenate up to 50 simultaneous tasks effectively, with validation on a public medical question-answering dataset. An economic analysis demonstrated up to a 17-fold cost reduction at 50 tasks using concatenation. These results identify the limits of LLMs for effective utilization and highlight avenues for cost-efficiency at the enterprise scale.

-

Perlis R, Schweitzer K. The Low-Cost, Battery-Powered AI-Enabled Ultrasound Device That Could Improve Global Obstetric Care. JAMA. Published online December 13, 2024. doi:10.1001/jama.2024.23859

[2024/12] After a decade spent studying prenatal care in Zambia, Stringer and his fellow researchers have worked to develop an AI-enabled alternative in the form of a low-cost, battery-powered ultrasonography probe that doesn’t require trained sonographers. In a clinical trial, the tool performed similarly to standard ultrasound for measuring gestational age, offering “immediate implications for obstetrical care in low-resource settings”.

-

Naved BA, Luo Y. Contrasting rule and machine learning based digital self triage systems in the USA. npj Digit. Med. 7, 381 (2024).

[2024/12] Patient smart access and self-triage systems have been in development for decades. As of now, no LLM for processing self-reported patient data has been published by health systems. Many expert systems and computational models have been released to millions. This review is the first to summarize progress in the field including an analysis of the exact self-triage solutions available on the websites of 647 health systems in the USA.

-

Williams, M., Karim, W., Gelman, J. et al. Ethical data acquisition for LLMs and AI algorithms in healthcare. npj Digit. Med. 7, 377 (2024). 

[2024/12] Artificial intelligence (AI) algorithms will become increasingly integrated into our healthcare systems in the coming decades. These algorithms require large volumes of data for development and fine-tuning. Patient data is typically acquired for AI algorithms through an opt-out system in the United States, while others support an opt-in model. We argue that ethical principles around autonomy, patient ownership of data, and privacy should be prioritized in the data acquisition paradigm.

 

~

 

 

 

 

 

In the Eyes of GPTs

 

 

 I see a bottleneck that most clinical teams face because they cannot use affordable, smaller open-source LLMs, such as Llama 3.1-8B, in clinical applications. For many diseases, these LLMs do not have the high accuracy required for clinical predictions, such as diagnostic predictions. Do you believe this is a real bottleneck for doctors? How can this bottleneck be removed?

 ChatGPT

Yes, this is a real bottleneck for clinical teams, and it has significant implications for the democratization of generative AI (GenAI) in healthcare. Open-source LLMs like Llama3.1-8B offer affordability and flexibility, but their limited out-of-the-box accuracy for clinical tasks, such as diagnostic prediction, makes them unsuitable for many use cases where high accuracy is critical. Here’s an analysis of the bottleneck and potential solutions:

Why This is a Real Bottleneck

  1. Accuracy Thresholds in Clinical SettingsClinical predictions, particularly diagnostic predictions, often require accuracy levels exceeding 85–90% due to the life-and-death stakes involved. Smaller open-source LLMs frequently fall short of these thresholds when not fine-tuned for specific medical tasks.
  2. Limited Resources in Regular Clinical Teams: Most clinical teams lack the technical expertise or computational resources to fine-tune these models effectively. This includes access to high-quality domain-specific datasets and knowledge of advanced techniques like low-rank adaptation (LoRA).
  3. Regulatory and Ethical Constraints: Using general-purpose LLMs without fine-tuning for specific medical tasks risks violating ethical and regulatory standards. Clinical decisions made with subpar tools can have serious consequences, both legally and for patient outcomes.

Strategies to Remove This Bottleneck

1. Fine-Tuning Open-Source LLMs for Clinical Use

  • LoRA Fine-Tuning: Use techniques like LoRA to efficiently fine-tune smaller open-source LLMs on task-specific datasets. This approach can significantly enhance their accuracy without requiring large-scale computational resources.
  • Synthetic Patient Data: Leverage synthetic datasets for fine-tuning, as these can help simulate diverse and rare clinical conditions while preserving patient privacy.
  • Shared Fine-Tuned Models: Create a library of pre-trained, task-specific LLMs that clinical teams can deploy directly. These models should be validated for specific use cases, such as diagnosing specific diseases.

2. Developing a User-Friendly GenAI Platform

  • A platform like the ELHS Copilot could guide clinical teams through the process of:
    1. Selecting the Right LLM: Choosing a suitable open-source or fine-tuned model for their needs.
    2. Testing in Simulated Environments: Using a sandbox environment to evaluate the model's performance on test cases.
    3. Deploying in Clinical Settings: Ensuring the model integrates seamlessly into electronic health records (EHRs) and other clinical workflows.

3. Democratizing Access Through Pre-Validated Models

  • Provide access to a repository of validated fine-tuned models for common clinical tasks (e.g., diagnostic prediction for diabetes, cancer staging). Validation should adhere to rigorous standards, ensuring these models are trusted and reliable.

4. Collaboration and Training for Clinical Teams

  • Workshops and Webinars: Train clinical teams on how to use GenAI tools effectively, emphasizing practical applications and ethical considerations.
  • Clinical Collaboration: Collaborate with clinical teams to fine-tune models for their specific needs, sharing the costs and benefits.

5. Integrating Feedback Loops

  • Build a system where real-world feedback from clinical deployments refines the models continuously. This could involve federated learning approaches to improve models without centralizing sensitive patient data.

Conclusion

The bottleneck arises not from the lack of technology but from the gap in enabling clinical teams to adapt affordable open-source LLMs for high-stakes applications. By providing fine-tuned, pre-validated models and empowering teams with training and accessible tools, this bottleneck can be removed. This would not only enhance diagnostic prediction accuracy but also accelerate the democratization of GenAI in healthcare, ensuring equitable access to advanced tools across all settings.

 

 I believe synthetic data is critical because it can be used to fine-tune LLMs. This step can produce fine-tuned models that are acceptable for clinical validation, thus enabling clinical teams to start using these fine-tuned models in clinical research. This will accelerate the development of fine-tuned LLMs using real patient data. What are your thoughts on this?

 ChatGPT

You are absolutely correct—synthetic data plays a pivotal role in fine-tuning LLMs for clinical applications, acting as a bridge to overcome current barriers in clinical research and model validation. This approach not only accelerates the deployment of fine-tuned LLMs but also ensures compliance with data privacy standards while enabling clinical teams to gain confidence in these models before transitioning to real patient data.

Why Synthetic Data is Critical

  1. Overcoming Data Privacy and Availability Issues: Real patient data is often inaccessible due to strict privacy regulations (e.g., HIPAA, GDPR) and institutional barriers. Synthetic data, generated to simulate real patient scenarios, avoids these restrictions while maintaining diversity and complexity.
  2. Diversity of Scenarios: Synthetic datasets can include rare diseases, edge cases, and underrepresented populations that are hard to capture in real-world datasets. This enhances the generalizability of fine-tuned models.
  3. Rapid Prototyping and Iteration: Synthetic data allows for quick experimentation and fine-tuning without waiting for institutional review board (IRB) approvals or patient consents, accelerating the feedback and improvement cycle.
  4. Benchmarking and Validation: Fine-tuned models trained on synthetic data can serve as a baseline for performance. Clinical teams can validate these models on real-world data, gradually building trust in their applicability.

How Synthetic Data Can Accelerate Development

1. Fine-Tuning LLMs with Synthetic Data: Use synthetic datasets for initial fine-tuning, tailoring models to specific tasks (e.g., diagnostic prediction, risk stratification). Employ domain-specific synthetic data generators (e.g., generative adversarial networks or simulation-based approaches) to produce high-fidelity clinical data.

2. Preparing Models for Clinical Validation: Fine-tuned LLMs on synthetic data can achieve a performance threshold suitable for validation with real-world patient data in clinical settings. This ensures that clinical teams start with a model that already demonstrates robust performance, reducing the burden of initial testing.

3. Enabling Clinical Research: Once fine-tuned models achieve acceptable accuracy levels with synthetic data, clinical teams can use these models in real-world research under strict governance. This ensures that real patient data is only introduced when models are mature and trustworthy.

4. Enhancing Feedback Loops: Synthetic data allows models to evolve through continuous feedback from both clinical research and synthetic test cases. This iterative process strengthens models before full-scale deployment.


Steps to Implement This Strategy

  1. Create a Synthetic Data Pipeline: Develop or partner with tools that generate high-quality, diverse synthetic datasets representing various clinical conditions. Incorporate tools like FHIR-compatible synthetic data generators to ensure seamless integration into existing health IT systems.
  2. Fine-Tune and Validate Models: Fine-tune LLMs with synthetic data and evaluate performance using predefined clinical benchmarks. Collaborate with clinical experts to ensure synthetic data covers real-world scenarios effectively.
  3. Collaborate with Clinical Teams: Provide fine-tuned models to clinical teams for research, emphasizing how synthetic data mitigates privacy risks and accelerates progress. Train teams on how to integrate synthetic data and fine-tuned models into their workflows.
  4. Transition to Real-World Data: Once fine-tuned models demonstrate strong performance, validate them with real patient data under strict regulatory frameworks to ensure reliability and generalizability.

Benefits of This Approach

  • Accelerates Readiness for Clinical Use: Models fine-tuned with synthetic data are closer to deployment, saving time and resources.
  • Promotes Equity: Smaller institutions or under-resourced teams gain access to cutting-edge tools without the initial need for extensive patient data.
  • Builds Confidence in AI: Demonstrating robust performance on synthetic data reassures clinicians, paving the way for adoption.
  • Enhances Innovation: Synthetic data enables rapid experimentation, fostering creativity and discovery in clinical AI applications.

This approach aligns perfectly with your vision of democratizing GenAI in healthcare. Would you like to discuss specific tools for generating synthetic data or strategies for clinical collaboration?

 

 Yes, please list common tools and strategies for generating synthetic patient data to fine-tune disease prediction LLMs.

 ChatGPT

Here’s a comprehensive list of tools and strategies commonly used for generating synthetic patient data, particularly for fine-tuning LLMs in disease prediction:

Common Tools for Generating Synthetic Patient Data

1. Open-Source Synthetic Data Generators

  • Synthea
    • Description: A widely-used open-source tool that generates realistic, patient-level electronic health record (EHR) data based on models of disease progression and clinical workflows.
    • Strengths: Can simulate entire populations, with detailed longitudinal health data. Compatible with HL7 FHIR.
    • Use Cases: Generating datasets for chronic disease prediction, longitudinal studies, and risk stratification tasks.
  • Gretel.ai
    • Description: A platform for generating synthetic datasets, including healthcare-specific datasets. Uses differential privacy techniques to ensure data security.
    • Strengths: Easy-to-use APIs for generating and evaluating synthetic data. Focus on privacy preservation.
    • Use Cases: Disease classification and model training with privacy constraints.
  • MOST (Modular Open Synthetic Data Toolkit)
    • Description: A modular toolkit for creating synthetic data tailored to specific use cases, including healthcare.
    • Strengths: Customizable for domain-specific applications.
    • Use Cases: Rare disease simulation or generating population-specific datasets.

2. AI-Driven Generative Models

  • Generative Adversarial Networks (GANs)
    • Example: MedGAN
      • Description: A GAN-based model specifically designed to generate realistic EHR data.
      • Strengths: Captures complex relationships in structured and unstructured healthcare data.
      • Use Cases: Fine-tuning LLMs for tasks like comorbidity prediction and treatment outcome analysis.
  • Variational Autoencoders (VAEs)
    • Example: Customized implementations for generating patient-level time-series data (e.g., vital signs, lab results).
    • Strengths: Suitable for generating high-dimensional, multi-modal clinical data.
    • Use Cases: Time-series disease progression prediction.
  • Synthetic Data Vault (SDV)
    • Description: A Python library for generating tabular, time-series, and relational synthetic data.
    • Strengths: Focus on preserving statistical relationships.
    • Use Cases: Disease risk modeling and EHR-based studies.

3. Federated Data Synthesis

  • Federated Generative Models
    • Description: Techniques for collaboratively generating synthetic data across institutions without sharing sensitive data directly.
    • Example Tools: Flower Framework, TensorFlow Federated.
    • Use Cases: Multi-site synthetic data generation for fine-tuning global disease prediction models.

4. Commercial Tools

  • MDClone
    • Description: A commercial synthetic data platform that generates patient-level data by preserving the statistical properties of original datasets.
    • Strengths: Focuses on privacy while enabling exploratory analysis and research.
    • Use Cases: Disease prediction, cohort simulation, and health outcomes research.
  • DataRobot AI Cloud
    • Description: Offers tools for creating healthcare-specific synthetic datasets as part of its broader AI platform.
    • Strengths: Enterprise-ready solution for scalable synthetic data generation.

Strategies for Generating Synthetic Data for Disease Prediction

1. Tailored Dataset Generation

  • Use disease-specific models (e.g., for diabetes, cardiovascular diseases) to create synthetic datasets enriched with patient demographics, risk factors, and disease progression markers.
  • Strategy: Define disease parameters (e.g., onset age, comorbidities) based on epidemiological data.

2. Augmentation of Limited Real Datasets

  • Use synthetic data to augment small real-world datasets, ensuring broader coverage of rare cases and edge conditions.
  • Tools: GANs, SMOTE (Synthetic Minority Oversampling Technique).

3. Longitudinal Data Simulation

  • Generate patient timelines showing disease progression, treatment, and outcomes.
  • Tools: Synthea, custom time-series generative models.

4. Multi-Modal Data Synthesis

  • Combine structured (e.g., lab tests) and unstructured (e.g., clinical notes) synthetic data to train LLMs capable of understanding complex healthcare scenarios.
  • Tools: GPT-like models, VAEs for text generation.

5. Real-World Validation

  • Test synthetic data against real-world data to ensure the distributions align and validate the predictive performance of fine-tuned models.

Integrating Synthetic Data in Fine-Tuning Pipelines

  1. Define Target Prediction Tasks: Clearly outline the disease-specific prediction goals (e.g., early diagnosis, risk stratification).
  2. Generate Diverse Synthetic Data: Ensure the data covers patient subgroups, disease phenotypes, and comorbidities.
  3. Fine-Tune Open-Source LLMs: Use lightweight techniques like LoRA to adapt pre-trained LLMs on synthetic datasets.
  4. Evaluate and Iterate: Benchmark fine-tuned models on synthetic data, then validate using real-world datasets in research settings.

 

~

 

 

GenAI/LHS Toolbox

 

 

ELHS GenAI Copilot Platform

The new copilot platform makes it easy for doctors and patients to use GenAI:

  1. Use free multi-LLM copilot, including GPT-4o, Gemini 5pro, Llama 3.1.
  2. Use free personal open-source LLM chatbot, i.e. my copilot.
  3. Choose and deploy fine-tuned open-source Llama3.1 model for copilot.

 

Meta AI

Meta AI helps you learn, create and connect in new ways. With our latest features you can now talk with recognizable voices and share images with Meta AI to learn more about your surroundings. Ask Meta AI anything.

 

~

 

ELHS Institute Relevant Resources:

ELHS GenAI Copilot Platform

ELHS Institute Website

ELHS Newsletters

ELHS Videos

LHS Tech Forums

Prof. Andrew Ng AI Classes

Stanford University RAISE Health Initiative

 

~ the end ~