Deep active learning in the presence of outlier examples poses a realistic yet challenging scenario. Acquiring unlabeled data for annotation requires a delicate balance between avoiding outliers to conserve the annotation budget and prioritizing useful inlier examples for effective training. In this work, we present an approach that leverages three highly synergistic components, which are identified as key ingredients: joint classifier training with inliers and outliers, semi-supervised learning through pseudo-labeling, and model ensembling. Our work demonstrates that ensembling significantly enhances the accuracy of pseudo-labeling and improves the quality of data acquisition. By enabling semi-supervision through the joint training process, where outliers are properly handled, we observe a substantial boost in classifier accuracy through the use of all available unlabeled examples. Notably, we reveal that the integration of joint training renders explicit outlier detection unnecessary; a conventional component for acquisition in prior work. The three key components align seamlessly with numerous existing approaches. Through empirical evaluations, we showcase that their combined use leads to a performance increase. Remarkably, despite its simplicity, our proposed approach outperforms all other methods in terms of performance. Code: https://github.com/vladan-stojnic/active-outliers
翻译:摘要:在存在异常样本的深度主动学习中,一个真实但具有挑战性的场景随之产生。获取未标注数据以进行标注需要精细平衡:既要避免异常样本以节约标注预算,又要优先选择有价值的内点样本以实现高效训练。本文提出了一种融合三个高度协同组件的方法,这些组件被视为关键要素:内点和外点的联合分类器训练、通过伪标签实现的半监督学习,以及模型集成。我们的工作表明,集成显著提升了伪标签的准确性,并改善了数据获取的质量。通过联合训练过程实现半监督学习,在此过程中异常样本得到妥善处理,我们观察到利用所有可用未标注样本可大幅提升分类器准确性。值得注意的是,我们揭示联合训练的整合使得显式异常检测变得不再必要,而后者是先前工作中数据获取的常规组件。这三个关键组件与众多现有方法无缝兼容。通过实证评估,我们展示了它们的组合使用能带来性能提升。尤为突出的是,尽管方法简单,我们提出的方法在性能上超越了所有其他方法。代码:https://github.com/vladan-stojnic/active-outliers