Paper Title: LoV3D: Grounding Cognitive Prognosis Reasoning in Longitudinal 3D Brain MRI via Regional Volume Assessments

Longitudinal brain MRI is essential for characterizing the progression of neurological diseases such as Alzheimer's disease assessment. However, current deep-learning tools fragment this process: classifiers reduce a scan to a label, volumetric pipelines produce uninterpreted measurements, and vision-language models (VLMs) may generate fluent but potentially hallucinated conclusions. We present LoV3D, a pipeline for training 3D vision-language models, which reads longitudinal T1-weighted brain MRI, produces a region-level anatomical assessment, conducts longitudinal comparison with the prior scan, and finally outputs a three-class diagnosis (Cognitively Normal, Mild Cognitive Impairment, or Dementia) along with a synthesized diagnostic summary. The stepped pipeline grounds the final diagnosis by enforcing label consistency, longitudinal coherence, and biological plausibility, thereby reducing the risks of hallucinations. The training process introduces a clinically-weighted Verifier that scores candidate outputs automatically against normative references derived from standardized volume metrics, driving Direct Preference Optimization without a single human annotation. On a subject-level held-out ADNI test set (479 scans, 258 subjects), LoV3D achieves 93.7% three-class diagnostic accuracy (+34.8% over the no-grounding baseline), 97.2% on two-class diagnosis accuracy (+4% over the SOTA) and 82.6% region-level anatomical classification accuracy (+33.1% over VLM baselines). Zero-shot transfer yields 95.4% on MIRIAD (100% Dementia recall) and 82.9% three-class accuracy on AIBL, confirming high generalizability across sites, scanners, and populations. Code is available at https://github.com/Anonymous-TEVC/LoV-3D.

翻译：纵向脑部MRI对于表征阿尔茨海默病等神经系统疾病的进展评估至关重要。然而，当前的深度学习工具将这一过程割裂开来：分类器将扫描简化为一个标签，体积测量流程产生难以解释的测量值，而视觉语言模型（VLMs）可能生成流畅但存在潜在幻觉的结论。我们提出了LoV3D，一个用于训练3D视觉语言模型的流程。该流程读取纵向T1加权脑部MRI，生成区域级解剖学评估，与先前扫描进行纵向比较，最终输出一个三类诊断（认知正常、轻度认知障碍或痴呆）以及一份综合的诊断摘要。该分步流程通过强制标签一致性、纵向连贯性和生物学合理性，将最终诊断建立在可靠基础上，从而降低了幻觉风险。训练过程引入了一个临床加权的验证器，该验证器根据源自标准化体积指标的规范参考，自动对候选输出进行评分，从而驱动无需任何人工标注的直接偏好优化。在受试者级别的ADNI保留测试集（479次扫描，258名受试者）上，LoV3D实现了93.7%的三类诊断准确率（比无落地基线提升+34.8%），97.2%的二类诊断准确率（比SOTA提升+4%）以及82.6%的区域级解剖学分类准确率（比VLM基线提升+33.1%）。零样本迁移在MIRIAD数据集上达到95.4%的准确率（痴呆召回率100%），在AIBL数据集上达到82.9%的三类诊断准确率，证实了其在不同机构、扫描仪和人群间的高泛化能力。代码发布于 https://github.com/Anonymous-TEVC/LoV-3D。