L-MAE: Longitudinal masked auto-encoder with time and severity-aware encoding for diabetic retinopathy progression prediction

Rachid Zeghlache,Pierre-Henri Conze,Mostafa El Habib Daho,Yihao Li,Alireza Rezaei,Hugo Le Boité,Ramin Tadayoni,Pascal Massin,Béatrice Cochener,Ikram Brahim,Gwenolé Quellec,Mathieu Lamard

Pre-training strategies based on self-supervised learning (SSL) have proven to be effective pretext tasks for many downstream tasks in computer vision. Due to the significant disparity between medical and natural images, the application of typical SSL is not straightforward in medical imaging. Additionally, those pretext tasks often lack context, which is critical for computer-aided clinical decision support. In this paper, we developed a longitudinal masked auto-encoder (MAE) based on the well-known Transformer-based MAE. In particular, we explored the importance of time-aware position embedding as well as disease progression-aware masking. Taking into account the time between examinations instead of just scheduling them offers the benefit of capturing temporal changes and trends. The masking strategy, for its part, evolves during follow-up to better capture pathological changes, ensuring a more accurate assessment of disease progression. Using OPHDIAT, a large follow-up screening dataset targeting diabetic retinopathy (DR), we evaluated the pre-trained weights on a longitudinal task, which is to predict the severity label of the next visit within 3 years based on the past time series examinations. Our results demonstrated the relevancy of both time-aware position embedding and masking strategies based on disease progression knowledge. Compared to popular baseline models and standard longitudinal Transformers, these simple yet effective extensions significantly enhance the predictive ability of deep classification models.

翻译：基于自监督学习（SSL）的预训练策略已被证明是计算机视觉中许多下游任务的有效 pretext 任务。由于医学图像与自然图像之间存在显著差异，典型 SSL 方法在医学影像领域的应用并非直接可行。此外，这些 pretext 任务往往缺乏对计算机辅助临床决策支持至关重要的上下文信息。本文基于著名的 Transformer 架构掩码自编码器（MAE），开发了纵向掩码自编码器（L-MAE）。具体而言，我们探索了时间感知位置嵌入与疾病进展感知掩码策略的重要性。通过考虑检查间的时间间隔（而非仅对其排序），该模型能够捕捉时间变化与趋势。掩码策略则在随访过程中动态演进，以更精准地捕捉病理变化，确保对疾病进展的评估更为准确。利用面向糖尿病视网膜病变（DR）的大型随访筛查数据集 OPHDIAT，我们在纵向任务上评估了预训练权重——该任务要求基于过去时间序列的检查结果，预测未来 3 年内下一次就诊的严重程度标签。实验结果验证了时间感知位置嵌入与基于疾病进展知识的掩码策略的有效性。与主流基线模型及标准纵向 Transformer 相比，这些简单而有效的扩展显著提升了深度分类模型的预测能力。