Semi-Supervised Disease Classification based on Limited Medical Image Data

In recent years, significant progress has been made in the field of learning from positive and unlabeled examples (PU learning), particularly in the context of advancing image and text classification tasks. However, applying PU learning to semi-supervised disease classification remains a formidable challenge, primarily due to the limited availability of labeled medical images. In the realm of medical image-aided diagnosis algorithms, numerous theoretical and practical obstacles persist. The research on PU learning for medical image-assisted diagnosis holds substantial importance, as it aims to reduce the time spent by professional experts in classifying images. Unlike natural images, medical images are typically accompanied by a scarcity of annotated data, while an abundance of unlabeled cases exists. Addressing these challenges, this paper introduces a novel generative model inspired by H\"older divergence, specifically designed for semi-supervised disease classification using positive and unlabeled medical image data. In this paper, we present a comprehensive formulation of the problem and establish its theoretical feasibility through rigorous mathematical analysis. To evaluate the effectiveness of our proposed approach, we conduct extensive experiments on five benchmark datasets commonly used in PU medical learning: BreastMNIST, PneumoniaMNIST, BloodMNIST, OCTMNIST, and AMD. The experimental results clearly demonstrate the superiority of our method over existing approaches based on KL divergence. Notably, our approach achieves state-of-the-art performance on all five disease classification benchmarks. By addressing the limitations imposed by limited labeled data and harnessing the untapped potential of unlabeled medical images, our novel generative model presents a promising direction for enhancing semi-supervised disease classification in the field of medical image analysis.

翻译：近年来，基于正样本与未标注样本的学习（PU学习）领域取得了显著进展，尤其在图像与文本分类任务中。然而，将PU学习应用于半监督疾病分类仍面临严峻挑战，主要源于标注医学图像的稀缺性。在医学图像辅助诊断算法领域，仍存在诸多理论与实际障碍。PU学习在医学图像辅助诊断中的研究具有重要意义，旨在减少专业医师对图像分类所耗费的时间。与自然图像不同，医学图像通常标注数据匮乏，而存在大量未标注病例。针对上述挑战，本文提出一种基于Hölder散度的新型生成模型，专门用于利用正样本与未标注医学图像数据进行半监督疾病分类。本文对问题进行系统建模，并通过严谨的数学分析论证其理论可行性。为评估所提方法的有效性，我们在PU医学学习常用的五个基准数据集（BreastMNIST、PneumoniaMNIST、BloodMNIST、OCTMNIST和AMD）上开展广泛实验。实验结果明确表明，本方法优于基于KL散度的现有方法。值得注意的是，本方法在所有五个疾病分类基准上均达到最优性能。通过突破标注数据有限的瓶颈，并挖掘未标注医学图像的潜在价值，本文提出的新型生成模型为提升医学图像分析领域的半监督疾病分类提供了有前景的研究方向。