Self-Supervised Learning for Pre-training Capsule Networks: Overcoming Medical Imaging Dataset Challenges

Deep learning techniques are increasingly being adopted in diagnostic medical imaging. However, the limited availability of high-quality, large-scale medical datasets presents a significant challenge, often necessitating the use of transfer learning approaches. This study investigates self-supervised learning methods for pre-training capsule networks in polyp diagnostics for colon cancer. We used the PICCOLO dataset, comprising 3,433 samples, which exemplifies typical challenges in medical datasets: small size, class imbalance, and distribution shifts between data splits. Capsule networks offer inherent interpretability due to their architecture and inter-layer information routing mechanism. However, their limited native implementation in mainstream deep learning frameworks and the lack of pre-trained versions pose a significant challenge. This is particularly true if aiming to train them on small medical datasets, where leveraging pre-trained weights as initial parameters would be beneficial. We explored two auxiliary self-supervised learning tasks, colourisation and contrastive learning, for capsule network pre-training. We compared self-supervised pre-trained models against alternative initialisation strategies. Our findings suggest that contrastive learning and in-painting techniques are suitable auxiliary tasks for self-supervised learning in the medical domain. These techniques helped guide the model to capture important visual features that are beneficial for the downstream task of polyp classification, increasing its accuracy by 5.26% compared to other weight initialisation methods.

翻译：深度学习技术在诊断性医学影像中的应用日益广泛。然而，高质量大规模医学数据集的有限可用性构成了重大挑战，通常需要采用迁移学习方法。本研究探讨了在结肠癌息肉诊断中用于胶囊网络预训练的自监督学习方法。我们使用了包含3,433个样本的PICCOLO数据集，该数据集体现了医学数据集的典型挑战：规模小、类别不平衡以及数据划分间的分布偏移。胶囊网络因其架构和层间信息路由机制而具有固有的可解释性。然而，其在主流深度学习框架中原生实现的有限性以及预训练版本的缺乏构成了重大挑战。若要在小型医学数据集上训练此类网络，利用预训练权重作为初始参数将大有裨益，而当前现状使这一目标尤其难以实现。我们探索了着色和对比学习两项辅助自监督学习任务用于胶囊网络预训练，并将自监督预训练模型与其他初始化策略进行了比较。研究结果表明，对比学习和图像修复技术是医学领域自监督学习的合适辅助任务。这些技术有助于引导模型捕获对息肉分类下游任务有益的重要视觉特征，相比其他权重初始化方法，模型准确率提升了5.26%。