MIRAM: Masked Image Reconstruction Across Multiple Scales for Breast Lesion Risk Prediction

Self-supervised learning (SSL) has garnered substantial interest within the machine learning and computer vision communities. Two prominent approaches in SSL include contrastive-based learning and self-distillation utilizing cropping augmentation. Lately, masked image modeling (MIM) has emerged as a more potent SSL technique, employing image inpainting as a pretext task. MIM creates a strong inductive bias toward meaningful spatial and semantic understanding. This has opened up new opportunities for SSL to contribute not only to classification tasks but also to more complex applications like object detection and image segmentation. Building upon this progress, our research paper introduces a scalable and practical SSL approach centered around more challenging pretext tasks that facilitate the acquisition of robust features. Specifically, we leverage multi-scale image reconstruction from randomly masked input images as the foundation for feature learning. Our hypothesis posits that reconstructing high-resolution images enables the model to attend to finer spatial details, particularly beneficial for discerning subtle intricacies within medical images. The proposed SSL features help improve classification performance on the Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) dataset. In pathology classification, our method demonstrates a 3\% increase in average precision (AP) and a 1\% increase in the area under the receiver operating characteristic curve (AUC) when compared to state-of-the-art (SOTA) algorithms. Moreover, in mass margins classification, our approach achieves a 4\% increase in AP and a 2\% increase in AUC.

翻译：自监督学习（SSL）在机器学习和计算机视觉领域引起了广泛关注。SSL中两种主流方法包括基于对比的学习和利用裁剪增强的自蒸馏。近年来，掩码图像建模（MIM）作为一种更强大的SSL技术出现，其以图像修复作为预训练任务。MIM为模型建立了对有意义空间与语义理解的强归纳偏置。这为SSL开辟了新的机遇，使其不仅能服务于分类任务，还能应用于目标检测和图像分割等更复杂的场景。基于此进展，本研究论文提出了一种可扩展且实用的SSL方法，其核心在于设计更具挑战性的预训练任务以促进鲁棒特征的习得。具体而言，我们利用从随机掩码输入图像中进行多尺度图像重建作为特征学习的基础。我们的假设认为，重建高分辨率图像能使模型关注更精细的空间细节，这对于辨别医学图像中微妙的复杂结构尤为有益。所提出的SSL特征有助于提升在数字化筛查乳腺X线摄影数据库精选子集（CBIS-DDSM）上的分类性能。在病理分类任务中，与当前最先进（SOTA）算法相比，我们的方法在平均精度（AP）上提升了3%，在受试者工作特征曲线下面积（AUC）上提升了1%。此外，在肿块边缘分类任务中，我们的方法实现了AP提升4%和AUC提升2%。