We published a novel approach for ChatGPT benchmarking study in JAMIA.


We pioneered the use of live symptom checking services, such as Mayo Clinic Symptom Checker, for benchmarking ChatGPT’s symptom checking capabilities. Across a broad range of 194 diseases, the symptom checking accuracy approached 80% for GPT-4 model, which qualifies ChatGPT for clinical studies.  


Chen A, Chen DO, Tian L. Benchmarking the symptom-checking capabilities of ChatGPT for a broad range of diseases. J Am Med Inform Assoc. 2023 Dec 18:ocad245. doi: 10.1093/jamia/ocad245.