The success of self-supervised learning (SSL) has mostly been attributed to the availability of unlabeled yet large-scale datasets. However, in a specialized domain such as medical imaging which is a lot different from natural images, the assumption of data availability is unrealistic and impractical, as the data itself is scanty and found in small databases, collected for specific prognosis tasks. To this end, we seek to investigate the applicability of self-supervised learning algorithms on small-scale medical imaging datasets. In particular, we evaluate $4$ state-of-the-art SSL methods on three publicly accessible \emph{small} medical imaging datasets. Our investigation reveals that in-domain low-resource SSL pre-training can yield competitive performance to transfer learning from large-scale datasets (such as ImageNet). Furthermore, we extensively analyse our empirical findings to provide valuable insights that can motivate for further research towards circumventing the need for pre-training on a large image corpus. To the best of our knowledge, this is the first attempt to holistically explore self-supervision on low-resource medical datasets.
翻译:自监督学习(SSL)的成功主要归因于大规模未标注数据集的可用性。然而,在医学影像这一与自然图像显著不同的专业领域,数据可及性的假设既不现实也不实用,因为相关数据本身稀少,且存在于为特定预后任务收集的小型数据库中。为此,我们探究自监督学习算法在小规模医学影像数据集上的适用性。具体而言,我们在三个公开可获取的小型医学影像数据集上评估了四种先进的SSL方法。研究表明,领域内低资源自监督预训练能够取得与基于大规模数据集(如ImageNet)的迁移学习相竞争的性能。此外,我们广泛分析了实验结果,以提供有价值的见解,从而推动进一步研究,力求避免对大规模图像语料库预训练的依赖。据我们所知,这是首次在低资源医学数据集上对自监督方法进行系统性探索的尝试。