AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer's Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study

翻译：AD-CARE：一种基于指南、模态无关的大语言模型智能体，用于真实世界阿尔茨海默病诊断——多队列评估、公平性分析与读者研究

Wenlong Hou,Sheng Bi,Guangqian Yang,Lihao Liu,Ye Du,Hanxiao Xue,Juncheng Wang,Yuxiang Feng,Yue Xun,Nanxi Yu,Ning Mao,Mo Yang,Yi Wah Eva Cheung,Ling Long,Kay Chen Tan,Lequan Yu,Xiaomeng Ma,Shaozhen Yan,Shujun Wang

Alzheimer's disease (AD) is a growing global health challenge as populations age, and timely, accurate diagnosis is essential to reduce individual and societal burden. However, real-world AD assessment is hampered by incomplete, heterogeneous multimodal data and variability across sites and patient demographics. Although large language models (LLMs) have shown promise in biomedicine, their use in AD has largely been confined to answering narrow, disease-specific questions rather than generating comprehensive diagnostic reports that support clinical decision-making. Here we expand LLM capabilities for clinical decision support by introducing AD-CARE, a modality-agnostic agent that performs guideline-grounded diagnostic assessment from incomplete, heterogeneous inputs without imputing missing modalities. By dynamically orchestrating specialized diagnostic tools and embedding clinical guidelines into LLM-driven reasoning, AD-CARE generates transparent, report-style outputs aligned with real-world clinical workflows. Across six cohorts comprising 10,303 cases, AD-CARE achieved 84.9% diagnostic accuracy, delivering 4.2%-13.7% relative improvements over baseline methods. Despite cohort-level differences, dataset-specific accuracies remain robust (80.4%-98.8%), and the agent consistently outperforms all baselines. AD-CARE reduced performance disparities across racial and age subgroups, decreasing the average dispersion of four metrics by 21%-68% and 28%-51%, respectively. In a controlled reader study, the agent improved neurologist and radiologist accuracy by 6%-11% and more than halved decision time. The framework yielded 2.29%-10.66% absolute gains over eight backbone LLMs and converges their performance. These results show that AD-CARE is a scalable, practically deployable framework that can be integrated into routine clinical workflows for multimodal decision support in AD.

翻译：阿尔茨海默病（AD）随着人口老龄化成为日益严峻的全球健康挑战，及时准确的诊断对于减轻个体和社会负担至关重要。然而，真实世界的AD评估面临不完整、异质多模态数据以及不同中心和患者人口统计学差异带来的困难。尽管大语言模型（LLM）在生物医学领域展现出潜力，但其在AD中的应用主要局限于回答狭窄的疾病相关问题，而非生成支持临床决策的综合性诊断报告。本研究通过引入AD-CARE扩展LLM在临床决策支持中的能力——这是一种模态无关的智能体，能够在不填补缺失模态的情况下，基于临床指南从不完整、异质化输入中进行诊断评估。通过动态编排专用诊断工具并将临床指南嵌入LLM驱动的推理过程，AD-CARE可生成与真实临床工作流程对齐的透明化报告式输出。在涵盖10303例病例的六个队列中，AD-CARE实现84.9%的诊断准确率，较基线方法相对提升4.2%-13.7%。尽管存在队列层面差异，各数据集特定准确率仍保持稳健（80.4%-98.8%），且该智能体持续优于所有基线方法。AD-CARE减少了种族和年龄亚组间的性能差异，使四项指标的平均离散度分别降低21%-68%和28%-51%。在受控读者研究中，该智能体使神经科医师和放射科医师的诊断准确率提升6%-11%，决策时间减半以上。该框架在八个骨干LLM上实现2.29%-10.66%的绝对增益，并收敛其性能。这些结果表明，AD-CARE是一种可扩展、可实际部署的框架，能够整合至AD多模态决策支持的常规临床工作流程中。