What Do LLMs Know About Alzheimer's Disease? Multi-loss Fine-Tuning and Probing for AD Detection

Reliable early detection of Alzheimer's disease (AD) is challenging, particularly due to the limited availability of labeled data. While large language models (LLMs) have shown strong transfer capabilities across do mains, adapting them to the AD domain through supervised fine-tuning remains largely unexplored. In this work, we empirically evaluate various model architectures across three heterogeneous transcript corpora (Pitt, CCC, ADRC) to investigate their effectiveness for text-based AD detection and analyze how task-relevant information is encoded within their internal representations. To the best of our knowledge, our fine-tuned BERT and T5 models establish a new state-of-the-art on the Pitt and CCC datasets, while achieving strong performance on ADRC. In parallel, the decoder-only Llama-1B achieves highly competitive results comparable to BERT and T5 across all three corpora, highlighting its effectiveness for AD detection. We further conduct a comprehensive evaluation of the Llama-1B backbone, analyzing cross-corpus transferability, optimal input chunk-size granularity, and the impact of clinical transcript markers. Also, we use linear probing to empirically show that fine-tuning shifts the representations of individual tokens, both linguistic markers and content words, in ways that reflect AD-related signal.

翻译：阿尔茨海默病（AD）的可靠早期检测极具挑战性，尤其是由于标注数据有限。尽管大语言模型（LLMs）在跨领域迁移方面展现出强大能力，但通过监督微调将其适配至AD领域的研究仍相对匮乏。本研究在三个异质性语料库（Pitt、CCC、ADRC）上对不同模型架构进行实证评估，探究其基于文本的AD检测效果，并分析任务相关信息如何编码于模型内部表征中。据我们所知，经过微调的BERT和T5模型在Pitt和CCC数据集上达到了新的最优性能，同时在ADRC数据集上表现优异。此外，仅含解码器的Llama-1B模型在三个语料库上均取得与BERT和T5高度竞争的结果，凸显其AD检测的有效性。我们进一步对Llama-1B骨干模型展开全面评估，包括跨语料库迁移能力、最优输入分块粒度以及临床转录标记的影响。同时，采用线性探针法实证表明：微调通过改变语言标记和内容词等单个词元表征的分布，从而编码与AD相关的信号特征。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

基于大语言模型的医疗推理研究：综述与 MR-Bench 基准测试

专知会员服务

16+阅读 · 4月13日

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

专知会员服务

24+阅读 · 2025年10月29日

LLM/智能体作为数据分析师：综述

专知会员服务

38+阅读 · 2025年9月30日

【ICML2025】用于图神经网络的LLM增强方法：因果机制识别视角下的分析

专知会员服务

16+阅读 · 2025年5月14日