Real-world performance of open-source large language models in diabetes diagnosis

Mar 25, 2026·

Shu-Ting YANG

1st Author

Sujie Liu

Co-1st Author

Yuxi Ma

Co-1st Author

Bao-Wen GAI

Junwei Liu

Liansheng Wang

Co-corresponding Author

Feng GAO

Co-corresponding Author

Zhiguang Zhou

Co-corresponding Author

· 0 min read

Source Document DOI

Abstract

This study evaluated diverse open-source large language models for diagnosing diabetes subtypes and comorbidities from unstructured clinical narratives in a large real-world Chinese cohort of 11,329 adults. The models performed strongly on complex diabetes subtype classification, reaching a peak F1 score of 0.951, while remaining less reliable for more rule-based tasks such as diabetic kidney disease and metabolic syndrome diagnosis. The results suggest that open-source LLMs are valuable clinical co-pilots for complex pattern recognition, with current limitations in procedural diagnostic reasoning.

Type

Journal article

Publication

Frontiers in Endocrinology

Last updated on Mar 25, 2026

Journal Article Diabetes Large Language Models Clinical AI Real-World Study

Authors

Shu-Ting YANG

Physician

An endocrinologist passionate about artificial intelligence and diabetes, focusing on applying machine learning and bioinformatics to diabetes heterogeneity research.

Authors

Bao-Wen GAI

PhD Student

I am a PhD student working on AI methods for colorectal cancer diagnosis and prognosis.

Authors

Feng GAO

Professor

My research leverages AI and big data to improve diagnostics, prognostics, and ultimately, outcomes in cancer and other biomedical fields.

INTELCAPE: A Deep Learning-Powered System for Automated, High-Accuracy Crohn's Disease Diagnosis via Capsule Endoscopy Mar 19, 2026 →