Semi-supervised learning (SSL) has attracted much attention since it reduces the expensive costs of collecting adequate well-labeled training data, especially for deep learning methods. However, traditional SSL is built upon an assumption that labeled and unlabeled data should be from the same distribution \textit{e.g.,} classes and domains. However, in practical scenarios, unlabeled data would be from unseen classes or unseen domains, and it is still challenging to exploit them by existing SSL methods. Therefore, in this paper, we proposed a unified framework to leverage these unseen unlabeled data for open-scenario semi-supervised medical image classification. We first design a novel scoring mechanism, called dual-path outliers estimation, to identify samples from unseen classes. Meanwhile, to extract unseen-domain samples, we then apply an effective variational autoencoder (VAE) pre-training. After that, we conduct domain adaptation to fully exploit the value of the detected unseen-domain samples to boost semi-supervised training. We evaluated our proposed framework on dermatology and ophthalmology tasks. Extensive experiments demonstrate our model can achieve superior classification performance in various medical SSL scenarios. The code implementations are accessible at: https://github.com/PyJulie/USSL4MIC.
翻译:半监督学习(SSL)因其能够降低收集充足高质量标注训练数据的高昂成本而备受关注,尤其对于深度学习方法而言。然而,传统SSL基于一个假设:标注数据和未标注数据应来自相同分布(例如,类别和领域)。但在实际场景中,未标注数据可能来自未见类别或未见领域,现有SSL方法仍难以有效利用这些数据。因此,本文提出一个统一框架,以利用这些未见未标注数据进行开放场景下的半监督医学图像分类。我们首先设计了一种称为双路径异常值估计的新型评分机制,以识别来自未见类别的样本。同时,为提取未见领域样本,我们采用了一种有效的变分自编码器(VAE)预训练方法。随后,我们进行领域自适应,以充分利用检测到的未见领域样本的价值来增强半监督训练。我们在皮肤病学和眼科学任务上评估了所提出的框架。大量实验表明,我们的模型能够在多种医学SSL场景中实现卓越的分类性能。代码实现可通过以下链接访问:https://github.com/PyJulie/USSL4MIC。