Semi-supervised learning (SSL) promises improved accuracy compared to training classifiers on small labeled datasets by also training on many unlabeled images. In real applications like medical imaging, unlabeled data will be collected for expediency and thus uncurated: possibly different from the labeled set in classes or features. Unfortunately, modern deep SSL often makes accuracy worse when given uncurated unlabeled data. Recent complex remedies try to detect out-of-distribution unlabeled images and then discard or downweight them. Instead, we introduce Fix-A-Step, a simpler procedure that views all uncurated unlabeled images as potentially helpful. Our first insight is that even uncurated images can yield useful augmentations of labeled data. Second, we modify gradient descent updates to prevent optimizing a multi-task SSL loss from hurting labeled-set accuracy. Fix-A-Step can repair many common deep SSL methods, improving accuracy on CIFAR benchmarks across all tested methods and levels of artificial class mismatch. On a new medical SSL benchmark called Heart2Heart, Fix-A-Step can learn from 353,500 truly uncurated ultrasound images to deliver gains that generalize across hospitals.
翻译:半监督学习(SSL)通过在大量无标签图像上进行训练,相较于仅使用小规模标注数据集训练分类器,有望提升分类准确率。在医学影像等实际应用中,为追求便捷性采集的无标签数据往往未经清洗——其类别或特征可能与标注数据存在差异。然而,当面对此类未清洗的无标签数据时,现代深度学习SSL方法常导致准确率下降。近期复杂修复方案尝试检测分布外无标签图像并予以丢弃或降权处理。为此,我们提出Fix-A-Step这一更简洁的方法,将全部未清洗无标签图像视为潜在有效资源。我们的第一个洞见是:即使未清洗图像也可产生对标注数据有益的数据增强。其次,我们改进梯度下降更新机制,防止多任务SSL损失函数优化损害标注集准确率。Fix-A-Step可修复多种常见深度学习SSL方法,在CIFAR基准测试中,对所有测试方法及人工类别失配等级均能提升准确率。在新型医学SSL基准Heart2Heart上,Fix-A-Step能够从353,500张真正未清洗的超声图像中学习,并实现泛化至不同医院的表现增益。