Real-world performance of open-source large language models in diabetes diagnosis

Mar 25, 2026·
Shu-Ting YANG
Shu-Ting YANG
1st Author
,
Sujie Liu
Co-1st Author
,
Yuxi Ma
Co-1st Author
Bao-Wen GAI
Bao-Wen GAI
,
Junwei Liu
,
Liansheng Wang
Co-corresponding Author
Feng GAO
Feng GAO
Co-corresponding Author
,
Zhiguang Zhou
Co-corresponding Author
· 0 min read
Abstract
This study evaluated diverse open-source large language models for diagnosing diabetes subtypes and comorbidities from unstructured clinical narratives in a large real-world Chinese cohort of 11,329 adults. The models performed strongly on complex diabetes subtype classification, reaching a peak F1 score of 0.951, while remaining less reliable for more rule-based tasks such as diabetic kidney disease and metabolic syndrome diagnosis. The results suggest that open-source LLMs are valuable clinical co-pilots for complex pattern recognition, with current limitations in procedural diagnostic reasoning.
Type
Publication
Frontiers in Endocrinology
publication
Shu-Ting YANG
Authors
Physician
An endocrinologist passionate about artificial intelligence and diabetes, focusing on applying machine learning and bioinformatics to diabetes heterogeneity research.
Bao-Wen GAI
Authors
PhD Student
I am a PhD student working on AI methods for colorectal cancer diagnosis and prognosis.
Feng GAO
Authors
Professor
My research leverages AI and big data to improve diagnostics, prognostics, and ultimately, outcomes in cancer and other biomedical fields.