Chest X-ray (CXR) interpretation often requires longitudinal comparison to assess disease progression. Existing approaches typically rely on temporal feature fusion or inter-study discrepancy modeling, yet remain limited in capturing subtle progression semantics and overlook the inherently directional nature of disease trajectories. In this paper, we propose ProTrans, a novel vision-language pretraining framework that formulates disease progression as a directional semantic transition between paired CXR studies. ProTrans leverages radiology reports to anchor individual CXR representations within interpretable disease states, and introduces a learnable progression feature map to explicitly encode semantic shifts between states, aligned with report-derived progression descriptions. To enforce direction-aware perception, ProTrans incorporates a reversed temporal modeling process and imposes bidirectional reconstruction consistency across states and transitions, thereby disentangling directional semantics and promoting coherent trajectory modeling. Extensive experiments on longitudinal downstream tasks, including disease progression classification and progression captioning, demonstrate that ProTrans consistently outperforms existing methods, establishing a unified pretraining framework for longitudinal CXR understanding. https://github.com/RPIDIAL/ProTrans
翻译:胸部X光片(CXR)解读通常需要进行纵向对比以评估疾病进展。现有方法主要依赖于时序特征融合或研究间差异建模,但在捕捉细微进展语义方面仍存在局限,且忽略了疾病轨迹固有的方向性本质。本文提出ProTrans——一种新颖的视觉-语言预训练框架,将疾病进展建模为配对CXR研究之间的方向性语义转换。该框架利用放射学报告将单个CXR表征锚定于可解释的疾病状态,并引入可学习的进展特征图,以显式编码状态间的语义偏移,使其与报告衍生的进展描述对齐。为增强方向感知能力,ProTrans整合了反向时序建模过程,并在状态与转换间施加双向重建一致性约束,从而解耦方向性语义并促进连贯的轨迹建模。在包括疾病进展分类与进展描述在内的纵向下游任务上的大量实验表明,ProTrans持续优于现有方法,为纵向CXR理解建立了统一的预训练框架。https://github.com/RPIDIAL/ProTrans