Longitudinal brain MRI is essential for characterizing the progression of neurological diseases such as Alzheimer's disease assessment. However, current deep-learning tools fragment this process: classifiers reduce a scan to a label, volumetric pipelines produce uninterpreted measurements, and vision-language models (VLMs) may generate fluent but potentially hallucinated conclusions. We present LoV3D, a pipeline for training 3D vision-language models, which reads longitudinal T1-weighted brain MRI, produces a region-level anatomical assessment, conducts longitudinal comparison with the prior scan, and finally outputs a three-class diagnosis (Cognitively Normal, Mild Cognitive Impairment, or Dementia) along with a synthesized diagnostic summary. The stepped pipeline grounds the final diagnosis by enforcing label consistency, longitudinal coherence, and biological plausibility, thereby reducing the risks of hallucinations. The training process introduces a clinically-weighted Verifier that scores candidate outputs automatically against normative references derived from standardized volume metrics, driving Direct Preference Optimization without a single human annotation. On a subject-level held-out ADNI test set (479 scans, 258 subjects), LoV3D achieves 93.7% three-class diagnostic accuracy (+34.8% over the no-grounding baseline), 97.2% on two-class diagnosis accuracy (+4% over the SOTA) and 82.6% region-level anatomical classification accuracy (+33.1% over VLM baselines). Zero-shot transfer yields 95.4% on MIRIAD (100% Dementia recall) and 82.9% three-class accuracy on AIBL, confirming high generalizability across sites, scanners, and populations. Code is available at https://github.com/Anonymous-TEVC/LoV-3D.
翻译:纵向脑部MRI对于表征阿尔茨海默病等神经系统疾病的进展评估至关重要。然而,当前的深度学习工具将这一过程割裂开来:分类器将扫描简化为一个标签,体积测量流程产生难以解释的测量值,而视觉语言模型(VLMs)可能生成流畅但存在潜在幻觉的结论。我们提出了LoV3D,一个用于训练3D视觉语言模型的流程。该流程读取纵向T1加权脑部MRI,生成区域级解剖学评估,与先前扫描进行纵向比较,最终输出一个三类诊断(认知正常、轻度认知障碍或痴呆)以及一份综合的诊断摘要。该分步流程通过强制标签一致性、纵向连贯性和生物学合理性,将最终诊断建立在可靠基础上,从而降低了幻觉风险。训练过程引入了一个临床加权的验证器,该验证器根据源自标准化体积指标的规范参考,自动对候选输出进行评分,从而驱动无需任何人工标注的直接偏好优化。在受试者级别的ADNI保留测试集(479次扫描,258名受试者)上,LoV3D实现了93.7%的三类诊断准确率(比无落地基线提升+34.8%),97.2%的二类诊断准确率(比SOTA提升+4%)以及82.6%的区域级解剖学分类准确率(比VLM基线提升+33.1%)。零样本迁移在MIRIAD数据集上达到95.4%的准确率(痴呆召回率100%),在AIBL数据集上达到82.9%的三类诊断准确率,证实了其在不同机构、扫描仪和人群间的高泛化能力。代码发布于 https://github.com/Anonymous-TEVC/LoV-3D。