[Paving the Way for Global Health Equity with Generative AI and Learning Health Systems (LHS)]
DeepSeek took the world by storm last month, emerging as a new superstar in the GenAI era. It made history by contributing to an estimated $1 trillion drop in U.S. stock market value in a single day. Among its many achievements, DeepSeek claimed that its top-performing DeepSeek R1 model was developed for less than $6 million, a stark contrast to the billions of dollars previously accepted as the industry standard. Such a drastic reduction in cost makes it undeniably disruptive.
However, to prove itself as a true original innovator, DeepSeek faces significant challenges ahead.
The first challenge is to clarify its model’s true identity in its API service. The open-source DeepSeek-V3 model consistently identifies itself as ChatGPT, developed by OpenAI, raising serious concerns. Additionally, the official DeepSeek API Platform has been out of service since January 29, 2025, further deepening the mystery.
The second challenge is to maintain technological momentum by releasing the next generation of LLMs. DeepSeek must prove its staying power as a global leader in LLM technology.
At ELHS Institute, we provide preclinically validated, fine-tuned open-source LLMs for clinical teams researching GenAI in healthcare. More powerful open-source models, such as DeepSeek’s LLMs, would be a welcome addition alongside Llama and Gwen models. However, as a responsible AI provider, we must ensure that DeepSeek-V3 is free from infringement risks before integrating it into real-world client projects. Otherwise, any work involving DeepSeek could be jeopardized if OpenAI sues for infringement.
To mitigate this risk, I conducted a brief investigation, which suggests significant legal concerns surrounding DeepSeek’s IP standing. To protect our clients and ourselves, my conclusion is to postpone using DeepSeek in client projects until all infringement concerns are clearly addressed. You can watch my short educational video for details on how to safeguard yourself before using DeepSeek.
Beyond DeepSeek: Recent Advances in Healthcare GenAI
Last month saw remarkable progress in healthcare GenAI research, including:
🏥 A Stanford team introduced ambient AI technology for ICU rooms.
🏥 Google MedPaLM researchers claim that both specialists and generalists rated Med-PaLM 2 responses as being as safe as physician answers.
📜 A NEJM AI paper highlights three key LLM opportunities for improving health equity: Detecting human bias; Creating structured datasets relevant to health equity; Enhancing equitable access to health information.
🩺 A Chinese team reported a generalist medical language model for disease diagnosis assistance, claiming its diagnostic accuracy outperforms GPT-4o.
Our own LLM research has demonstrated that fine-tuning Llama 3.1-8B with synthetic patient data can achieve >90% accuracy in predicting various diseases. Combined with the emerging ML-enabled Learning Health System (ML-LHS) unit, I believe GenAI can be effectively leveraged to build equitable learning health system units, ultimately enhancing health equity in real-world clinical care.
Keep reading to explore the latest research papers below, and enjoy my conversations with ChatGPT about the legal aspects of responsible AI.
Warm regards,
AJ
AJ Chen, PhD | ELHS Institute | https://elhsi.org
~
(Recent papers, news, and events showcasing the progress of GenAI and LHS)
Singhal, K., Tu, T., Gottweis, J. et al. Toward expert-level medical question answering with large language models. Nat Med (2025).
[2025/1] In a pilot study using real-world medical questions, specialists preferred Med-PaLM 2 answers to generalist physician answers 65% of the time. While specialist answers were still preferred overall, both specialists and generalists rated Med-PaLM 2 to be as safe as physician answers, demonstrating its growing potential in real-world medical applications.
-
Naved, B.A., Luo, Y. Contrasting rule and machine learning based digital self triage systems in the USA. npj Digit. Med. 7, 381 (2024).
[2024/12] Patient smart access and self-triage systems have been in development for decades. As of now, no LLM for processing self-reported patient data has been published by health systems. Many expert systems and computational models have been released to millions. This review is the first to summarize progress in the field including an analysis of the exact self-triage solutions available on the websites of 647 health systems in the USA.
-
Kopka, M., Wang, S.M., Kunz, S. et al. Technology-supported self-triage decision making. npj Health Syst. 2, 3 (2025).
[2025/1] Users tend to rely on technology for information gathering and analysis but remain responsible for information integration and the final decision. Based on these results, we developed a model for technology-assisted self-triage decision-making. Our quantitative results indicate that when using a high-performing SAA, laypeople’s decision accuracy improved from 53.2% to 64.5% (OR = 2.52, p < 001). In contrast, decision accuracy remained unchanged when using a LLM (54.8% before vs. 54.2% after usage, p = 79). These findings highlight the importance of studying SAAs/LLMs with humans in the loop, as opposed to analyzing them in isolation.
-
Liu, X., Liu, H., Yang, G. et al. A generalist medical language model for disease diagnosis assistance. Nat Med (2025).
[2025/1] Here we present MedFound, a generalist medical language model with 176 billion parameters, pre-trained on a large-scale corpus derived from diverse medical text and real-world clinical records. We further fine-tuned MedFound to learn physicians’ inferential diagnosis with a self-bootstrapping strategy-based chain-of-thought approach and introduced a unified preference alignment framework to align it with standard clinical practice. We conducted a comprehensive evaluation of the clinical applicability of LLMs for diagnosis involving artificial intelligence (AI) versus physician comparison, AI-assistance study and human evaluation framework.
-
Xiang, J., Wang, X., Zhang, X. et al. A vision–language foundation model for precision oncology. Nature (2025).
[2025/1] In this study, we developed the Multimodal transformer with Unified maSKed modeling (MUSK), a vision–language foundation model designed to leverage large-scale, unlabelled, unpaired image and text data. MUSK showed strong performance in outcome prediction, including melanoma relapse prediction, pan-cancer prognosis prediction and immunotherapy response prediction in lung and gastro-oesophageal cancers. MUSK effectively combined complementary information from pathology images and clinical reports and could potentially improve diagnosis and precision in cancer therapy.
-
Koirala, P., Thongprayoon, C., Miao, J. et al. Evaluating AI performance in nephrology triage and subspecialty referrals. Sci Rep 15, 3455 (2025).
[2025/1] This study evaluates the utility of ChatGPT, a large language model, in triaging nephrology cases through simulated real-world scenarios. Two nephrologists created 100 patient cases that encompassed various aspects of nephrology. ChatGPT’s performance in determining the appropriateness of nephrology consultations and identifying suitable nephrology subspecialties was assessed. The results demonstrated high accuracy; ChatGPT correctly determined the need for nephrology in 99–100% of cases, and it accurately identified the most suitable nephrology subspecialty triage in 96–99% of cases across two evaluation rounds. The agreement between the two rounds was 97%. While ChatGPT showed promise in improving medical triage efficiency and accuracy, the study also identified areas for refinement. This included the need for better integration of multidisciplinary care for patients with complex, intersecting medical conditions. This study’s findings highlight the potential of AI in enhancing decision-making processes in clinical workflow, and it can inform the development of AI-assisted triage systems tailored to institution-specific practices including multidisciplinary approaches.
-
Pierson E, et al. Using Large Language Models to Promote Health Equity. NEJM AI 2025;2(2)
[2025/1] While the discussion about the effects of large language models (LLMs) on health equity has been largely cautionary, LLMs also present significant opportunities for improving health equity. We highlight three such opportunities: improving the detection of human bias; creating structured datasets relevant to health equity; and improving equity of access to health information.
-
Dai W. et al. Developing ICU Clinical Behavioral Atlas Using Ambient Intelligence and Computer Vision. NEJM AI 2025;2(2).
[2025/1] We present Clinical Behavioral Atlas (CBA), a computer vision system that can identify 40 clinically relevant activity categories and 55 object categories solely through RGB video data. The system was developed using a dataset comprising over 140,000 hours of continuous video and over 350,000 densely annotated frames, collected from 16 sensors in 8 ICU rooms at an academic medical center. The model demonstrated strong performance in entity and activity detection, with sensitivities of 0.75∼0.81 and average precisions of 0.64∼0.73, respectively.
-
Kumar A., et al. A Cross-Sectional Study of GPT-4–Based Plain Language Translation of Clinical Notes to Improve Patient Comprehension of Disease Course and Management. NEJM AI 2025;2(2)
[2025/1] This study introduces a novel clinical application of Generative Pretrained Transformer 4 (GPT-4), assessing the effect of GPT-4–based plain language translation of discharge summary notes (DSNs) on subjective and objective comprehension, self-reported confidence, and time spent reading each DSN. GPT-4–based translation substantially improved patient comprehension of DSNs, especially in populations that historically have low health literacy.
-
Johri, S., Jeong, J., Tran, B.A. et al. An evaluation framework for clinical use of large language models in patient interaction tasks. Nat Med 31, 77–86 (2025).
[2025/1] This paper introduces the Conversational Reasoning Assessment Framework for Testing in Medicine (CRAFT-MD) approach for evaluating clinical LLMs. We applied CRAFT-MD to assess the diagnostic capabilities of GPT-4, GPT-3.5, Mistral and LLaMA-2-7b across 12 medical specialties. Our experiments revealed critical insights into the limitations of current LLMs in terms of clinical conversational reasoning, history-taking and diagnostic accuracy. These limitations also persisted when analyzing multimodal conversational and visual assessment capabilities of GPT-4V. We propose a comprehensive set of recommendations for future evaluations of clinical LLMs based on our empirical findings.
-
Metwally, A.A., Perelman, D., Park, H. et al. Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning. Nat. Biomed. Eng (2024).
[2025/1] Machine-learning models trained with glucose time series from OGTTs from the 32 individuals predicted the subphenotypes with areas under the curve (AUCs) of 95% for muscle insulin resistance, 89% for β-cell deficiency and 88% for impaired incretin action. With CGM-generated glucose curves obtained during at-home OGTTs, the models predicted the muscle-insulin-resistance and β-cell-deficiency subphenotypes of 29 individuals with AUCs of 88% and 84%, respectively. At-home identification of metabolic subphenotypes via a CGM may aid the risk stratification of individuals with early glucose dysregulation.
-
Lopez, I., Swaminathan, A., Vedula, K. et al. Clinical entity augmented retrieval for clinical information extraction. npj Digit. Med. 8, 45 (2025).
[2025/1] We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves information using entities. In conclusion, CLEAR utilizes clinical entities for information retrieval and achieves >70% reduction in token usage and inference time with improved performance compared to modern methods.
-
Eisemann, N., Bunk, S., Mukama, T. et al. Nationwide real-world implementation of AI for cancer detection in population-based mammography screening. Nat Med (2025).
[2025/1] Radiologists in the AI-supported screening group achieved a breast cancer detection rate of 6.7 per 1,000, which was 17.6% (95% confidence interval: +5.7%, +30.8%) higher than and statistically superior to the rate (5.7 per 1,000) achieved in the control group. Compared to standard double reading, AI-supported double reading was associated with a higher breast cancer detection rate without negatively affecting the recall rate, strongly indicating that AI can improve mammography screening metrics.
-
Liddicoat, J.E., Lenarczyk, G., Aboy, M. et al. A policy framework for leveraging generative AI to address enduring challenges in clinical trials. npj Digit. Med. 8, 33 (2025).
[2025/1] Can artificial intelligence improve clinical trial design? Despite their importance in medicine, over 40% of trials involve flawed protocols. We introduce and propose the development of application-specific language models (ASLMs) for clinical trial design across three phases: ASLM development by regulatory agencies, customization by Health Technology Assessment bodies, and deployment to stakeholders. This strategy could enhance trial efficiency, inclusivity, and safety, leading to more representative, cost-effective clinical trials.
-
Mao, X., Huang, Y., Jin, Y. et al. A phenotype-based AI pipeline outperforms human experts in differentially diagnosing rare diseases using EHRs. npj Digit. Med. 8, 68 (2025).
[2025/1] PhenoBrain utilizes a BERT-based natural language processing model to extract phenotypes from clinical texts in EHRs and employs five new diagnostic models for differential diagnoses of rare diseases. The AI system was developed and evaluated on diverse, multi-country rare disease datasets, comprising 2271 cases with 431 rare diseases. In 1936 test cases, PhenoBrain achieved an average predicted top-3 recall of 0.513 and a top-10 recall of 0.654, surpassing 13 leading prediction methods. In a human-computer study with 75 cases, PhenoBrain exhibited exceptional performance with a top-3 recall of 0.613 and a top-10 recall of 0.813, surpassing the performance of 50 specialist physicians and large language models like ChatGPT and GPT-4. Combining PhenoBrain’s predictions with specialists increased the top-3 recall to 0.768, demonstrating its potential to enhance diagnostic accuracy in clinical workflows.
~
(AJ in conversation with ChatGPT to understand how machines view human healthcare)
We have talked about Healthcare Responsible AI before. One of its basic components is being free of infringement. How can users check for potential infringement in any LLM or GenAI tool?
ChatGPT
Yes, we've discussed Healthcare Responsible AI, ensuring that any LLM or GenAI tool is free of infringement is crucial for its ethical and legal use. Users can take the following steps to check for potential intellectual property (IP) infringement:
1. Verify the Model’s License
Check the license type (e.g., Apache 2.0, MIT, GPL, commercial, proprietary).
Open-source models like Meta’s Llama, Mistral, or OpenAI’s older models (GPT-2) usually provide clear license terms.
Proprietary models (GPT-4, Claude, Gemini) may have restrictions on modifications and deployment.
Red flag: If a model is labeled as “open-source” but has vague or missing license details.
Example Check:
For Hugging Face models: Look at the model card and LICENSE file.
For OpenAI, DeepSeek, or Gemini: Check the official documentation and terms of service.
2. Inspect Training Data Sources
Many infringement claims arise from using copyrighted datasets in model training.
Publicly available data ≠ legally usable data (e.g., scraping books, journal articles, or proprietary medical databases).
Check if the model was trained using public domain, open datasets (like PubMed, MIMIC-IV), or proprietary data.
Example Check:
Look for references to Common Crawl, Wikipedia, ArXiv, PubMed, or other verifiable open datasets.
If trained on commercial medical databases (e.g., UpToDate, NEJM, Springer) without licensing agreements, it might be infringing.
3. Compare Outputs Against Copyrighted Content
Run queries to test if the model memorized and reproduces copyrighted material.
Example test: Ask it to summarize or repeat sections from paywalled medical journals, books, or clinical guidelines.
If it generates near-exact excerpts from copyrighted sources, the model may have been trained on proprietary data.
Example Check:
Try prompting: “Summarize the latest clinical guidelines from the European Society of Cardiology.”
If the response contains near-verbatim content from official guidelines, it might indicate potential infringement.
4. Check for Model Weights and Code Origin
Some AI companies clone or modify existing LLMs (e.g., OpenAI’s GPT-4) but claim them as their own.
If the architecture or behavior is suspiciously similar to a known proprietary model, it could indicate unauthorized use.
Example Check:
Compare response styles, embeddings, and attention patterns with existing LLMs.
Run adversarial prompts—if two different models give identical responses with specific phrasing, it might indicate unauthorized replication.
5. Investigate Previous Legal or Ethical Issues
Some AI models have faced legal challenges over data usage (e.g., OpenAI vs. The New York Times, Stability AI’s lawsuits).
Research if the model provider has been accused of scraping proprietary content or violating copyright laws.
Example Check:
Look up the model provider’s name + "lawsuit" or "copyright violation."
Monitor industry discussions (Hugging Face forums, Twitter AI community, GitHub issues).
6. Check Compliance with Healthcare AI Standards
HIPAA, GDPR, and FDA regulations require AI models handling medical data to ensure lawful data usage.
If an AI tool is trained on private patient data without explicit consent, it could violate data protection laws.
Example Check:
Ask: “Does this AI comply with HIPAA, GDPR, or FDA regulations?”
If the company cannot provide a clear data provenance or audit trail, it’s a red flag.
7. Consult Legal or Compliance Experts
If you're planning to deploy or fine-tune an AI model, consult:
AI-focused IP lawyers
Regulatory experts (HIPAA, GDPR, FDA, MHRA)
In-house legal teams (for institutions)
They can assess risks based on specific contracts, licenses, and data sources.
Final Thoughts
Ensuring Healthcare Responsible AI means making sure models are:
✅ Legally compliant
✅ Ethically trained
✅ Transparent in data usage
✅ Ensuring patient safety
✅ Effective in prediction
✅ Protecting patient privacy and data security
✅ Equitable access
What are the risks involved in using a software tool that infringes on someone else's IP?
ChatGPT
Using a software tool that infringes on someone else’s intellectual property (IP) poses significant legal, financial, and reputational risks. These risks can impact individuals, healthcare institutions, and companies deploying LLMs or GenAI tools in clinical settings. Below are the key risks:
1. Legal Risks
⚖️ Copyright Infringement Lawsuits
If a tool uses copyrighted text, code, images, or datasets without permission, the copyright holder can sue.
Example: OpenAI is facing lawsuits from media companies for using copyrighted news articles in training GPT models.
Risk: A court may order fines, content removal, or a complete shutdown of the tool.
📜 Patent Violation
If a software tool copies an AI model’s architecture, algorithms, or fine-tuning methods without licensing, it can infringe patents.
Example: Companies like IBM, Google, and OpenAI hold patents on AI technologies.
Risk: Patent holders can demand licensing fees, block usage, or sue for damages.
🔐 Trade Secret Violations
If an AI tool incorporates proprietary datasets or techniques from a competitor without permission, it could breach trade secret laws.
Example: If a company reverse-engineers a GPT-4-level model and claims it as their own, it may be using OpenAI’s confidential IP.
Risk: Heavy legal penalties, injunctions, or corporate espionage charges.
⚠️ Data Privacy Violations (HIPAA, GDPR)
If a tool uses protected patient data (PHI) or personal data without consent, it may violate:
HIPAA (US) for health data.
GDPR (EU) for any personally identifiable information.
CCPA (California) for consumer data rights.
Example: AI trained on unauthorized patient records could result in lawsuits and compliance fines.
Risk: Fines up to $50 million per violation (GDPR penalties can reach €20 million or 4% of global revenue).
2. Financial Risks
💰 Fines and Settlement Costs
Infringement lawsuits can cost millions in legal fees and settlements.
Example: Getty Images sued Stability AI for using copyrighted photos in training, seeking billions in damages.
🚫 Loss of Licensing and Business Contracts
Companies using infringing AI models may lose partnerships, funding, or government contracts.
Example: Hospitals and pharma companies require HIPAA-compliant AI—if a tool is found infringing, they must stop using it immediately.
💸 Forced Product Shutdown or Retraction
If an AI tool is found infringing, a court may force a recall or order developers to disable key features.
Example: If a healthcare AI assistant relies on proprietary guidelines (e.g., UpToDate, NEJM), the owners can force it offline.
3. Reputational Risks
🔴 Loss of Trust in AI Models
If a healthcare AI tool is linked to IP theft, doctors, patients, and institutions may stop using it.
Example: If a GenAI copilot in healthcare is trained on scraped medical textbooks, professionals won’t trust its recommendations.
📰 Negative Media Coverage
AI plagiarism scandals make headlines, causing brand damage.
Example: If a company is found using stolen medical datasets, it could face public backlash and regulatory probes.
💼 Blacklisting from Future AI Deployments
Hospitals, pharma companies, and regulators may blacklist infringing AI vendors.
Example: If an AI startup is sued for IP violations, future healthcare partnerships could collapse.
4. Ethical Risks
⚠️ Unfair Use of Other People’s Work
If an AI tool profits from copyrighted research papers, medical books, or software, it is unfair to the original creators.
Example: A model trained on stolen clinical trial data could provide insights without crediting the researchers.
🛑 AI Bias and Hallucinations
Unverified, scraped data from copyrighted sources may include bias or misinformation, leading to:
Inaccurate medical recommendations.
Misinformation spread in clinical decision support.
5. Operational Risks
🔄 Forced Model Retraining
If a model is found to have trained on infringing data, the company may need to:
Remove the dataset.
Retrain the model, which is costly.
Lose months or years of progress.
⏳ Deployment Delays
IP disputes can halt AI adoption in hospitals and clinics.
Example: If an FDA-approved AI tool is sued for IP infringement, hospitals must stop using it until the case is resolved.
How to Avoid These Risks?
✅ Check licenses before using AI tools.
✅ Use only authorized datasets (e.g., open-access medical sources).
✅ Verify AI compliance with HIPAA, GDPR, and AI Act regulations.
✅ Run IP audits before deploying AI models in healthcare.
✅ Use open-source AI models with clear licensing (e.g., Mistral, Llama-2 under Apache 2.0).
~
|
The new copilot platform makes it easy for doctors and patients to use GenAI: 🚀 Fine-tune Llama 3.1-8B LLM for high-accuracy disease prediction, overcoming the bottlenecks that most clinical teams face in GenAI clinical research and applications.
|
|
Meta AI helps you learn, create and connect in new ways. With our latest features you can now talk with recognizable voices and share images with Meta AI to learn more about your surroundings. Ask Meta AI anything. |
~
👉For LLM Fine-Tuning Services, Contact support@elhsi.org 📩
~ the end ~
Democratizing GenAI and LHS in Healthcare to Help Achieve Global Health Equity
info@elhsi.org
Palo Alto, California, USA