Imaging-derived phenotypes (IDPs) summarize multi-organ physiology but provide only static snapshots of diseases that evolve over time. In contrast, longitudinal electronic health records encode disease trajectories through temporal dependencies among past diagnosis events and comorbidity structure. We hypothesize that IDPs and disease trajectories contain partially shared disease-relevant structure. We propose a trajectory-aware distillation framework that transfers structural knowledge from a generative disease trajectory Transformer into an organ-wise IDP encoder. A population-scale trajectory model trained on longitudinal diagnosis sequences produces subject-level embeddings that supervise IDP representation learning via geometry-preserving alignment. During downstream prediction, trajectory and imaging representations can also be fused via cross-attention. Across 159 diseases in the UK Biobank cohort, trajectory-aware pretraining consistently improves both discrimination (AUC) and time-to-onset prediction (MAE), with the largest gains for low-prevalence diseases. Similarity relationships in IDP embedding space also align with those in trajectory space, providing supportive evidence for partially aligned representation geometry. These results suggest that population-scale generative disease models can serve as structural priors for data-limited imaging modalities, improving robustness under realistic cohort constraints.
翻译:影像衍生表型(IDP)总结了多器官生理特征,但仅能提供随时间演变的疾病的静态快照。相反,纵向电子健康记录通过既往诊断事件的时间依赖性和共病结构编码了疾病轨迹。我们假设IDP和疾病轨迹包含部分共享的疾病相关结构。我们提出了一种轨迹感知知识蒸馏框架,将生成式疾病轨迹Transformer的结构知识迁移到器官级IDP编码器中。基于纵向诊断序列训练的人群尺度轨迹模型产生受试者级嵌入,通过几何保持对齐监督IDP表征学习。在下游预测过程中,轨迹和影像表征还可通过交叉注意力进行融合。在英国生物银行队列的159种疾病中,轨迹感知的预训练持续改善了区分能力(AUC)和发病时间预测(MAE),其中对低患病率疾病的提升最大。IDP嵌入空间中的相似性关系也与轨迹空间中的关系对齐,为部分对齐的表征几何提供了支持性证据。这些结果表明,人群尺度的生成式疾病模型可作为数据有限影像模态的结构先验,在现实队列约束下提高鲁棒性。