Despite the progress of Semi-supervised Learning (SSL), existing methods fail to utilize unlabeled data effectively and efficiently. Many pseudo-label-based methods select unlabeled examples based on inaccurate confidence scores from the classifier. Most prior work also uses all available unlabeled data without pruning, making it difficult to handle large amounts of unlabeled data. To address these issues, we propose two methods: Variational Confidence Calibration (VCC) and Influence-Function-based Unlabeled Sample Elimination (INFUSE). VCC is an universal plugin for SSL confidence calibration, using a variational autoencoder to select more accurate pseudo labels based on three types of consistency scores. INFUSE is a data pruning method that constructs a core dataset of unlabeled examples under SSL. Our methods are effective in multiple datasets and settings, reducing classification errors rates and saving training time. Together, VCC-INFUSE reduces the error rate of FlexMatch on the CIFAR-100 dataset by 1.08% while saving nearly half of the training time.
翻译:摘要:尽管半监督学习(SSL)取得了进展,现有方法仍无法有效且高效地利用未标注数据。许多基于伪标签的方法依赖于分类器提供的不精确置信度分数来选择未标注样本。此外,多数先前工作未经筛选地使用所有可用未标注数据,导致难以处理大规模未标注数据集。为解决这些问题,我们提出两种方法:变分置信度校准(VCC)和基于影响函数的未标注样本消除(INFUSE)。VCC是一种通用的SSL置信度校准插件,通过变分自编码器基于三种一致性分数选择更准确的伪标签。INFUSE是一种数据剪枝方法,用于构建SSL下的未标注样本核心数据集。我们的方法在多个数据集和设置下均表现有效,能够降低分类错误率并节省训练时间。联合使用VCC-INFUSE,在CIFAR-100数据集上,FlexMatch的错误率降低了1.08%,同时节省近一半的训练时间。