Training with noisy class labels impairs neural networks' generalization performance. In this context, mixup is a popular regularization technique to improve training robustness by making memorizing false class labels more difficult. However, mixup neglects that, typically, multiple annotators, e.g., crowdworkers, provide class labels. Therefore, we propose an extension of mixup, which handles multiple class labels per instance while considering which class label originates from which annotator. Integrated into our multi-annotator classification framework annot-mix, it performs superiorly to eight state-of-the-art approaches on eleven datasets with noisy class labels provided either by human or simulated annotators. Our code is publicly available through our repository at https://github.com/ies-research/annot-mix.
翻译:训练过程中带有噪声的类别标签会损害神经网络的泛化性能。在此背景下,mixup作为一种流行的正则化技术,通过增加模型记忆错误类别标签的难度来提升训练的鲁棒性。然而,mixup忽略了通常存在多个标注者(如众包工人)提供类别标签这一现实情况。为此,我们提出一种针对mixup的扩展方法,能够处理每个实例包含多个类别标签的情形,同时保留每个类别标签与对应标注者的关联关系。将该方法集成到我们的多标注者分类框架annot-mix后,在11个包含人工或模拟标注者提供的噪声类别标签的数据集上,其性能优于八种最先进方法。我们的代码已通过公开仓库https://github.com/ies-research/annot-mix 发布。