Labels noise refers to errors in training labels caused by cheap data annotation methods, such as web scraping or crowd-sourcing, which can be detrimental to the performance of supervised classifiers. Several methods have been proposed to counteract the effect of random label noise in supervised classification, and some studies have shown that BERT is already robust against high rates of randomly injected label noise. However, real label noise is not random; rather, it is often correlated with input features or other annotator-specific factors. In this paper, we evaluate BERT in the presence of two types of realistic label noise: feature-dependent label noise, and synthetic label noise from annotator disagreements. We show that the presence of these types of noise significantly degrades BERT classification performance. To improve robustness, we evaluate different types of ensembles and noise-cleaning methods and compare their effectiveness against label noise across different datasets.
翻译:标签噪声指的是由廉价数据标注方法(如网络爬取或众包)导致的训练标签错误,这会严重影响监督分类器的性能。已有多种方法提出用于抵消监督分类中随机标签噪声的影响,部分研究表明BERT对高比例随机注入的标签噪声已具备鲁棒性。然而,现实中的标签噪声并非随机分布,通常与输入特征或其他标注者特异性因素相关。本文在两类现实标签噪声场景下评估BERT性能:特征依赖型标签噪声与标注者分歧产生的合成标签噪声。研究表明,这两类噪声的存在会显著降低BERT分类性能。为提升鲁棒性,我们评估了不同类型的集成方法与噪声清洗技术,并比较了其在不同数据集上对标签噪声的抑制效果。