Noise in Relation Classification Dataset TACRED: Characterization and Reduction

The overarching objective of this paper is two-fold. First, to explore model-based approaches to characterize the primary cause of the noise. in the RE dataset TACRED Second, to identify the potentially noisy instances. Towards the first objective, we analyze predictions and performance of state-of-the-art (SOTA) models to identify the root cause of noise in the dataset. Our analysis of TACRED shows that the majority of the noise in the dataset originates from the instances labeled as no-relation which are negative examples. For the second objective, we explore two nearest-neighbor-based strategies to automatically identify potentially noisy examples for elimination and reannotation. Our first strategy, referred to as Intrinsic Strategy (IS), is based on the assumption that positive examples are clean. Thus, we have used false-negative predictions to identify noisy negative examples. Whereas, our second approach, referred to as Extrinsic Strategy, is based on using a clean subset of the dataset to identify potentially noisy negative examples. Finally, we retrained the SOTA models on the eliminated and reannotated dataset. Our empirical results based on two SOTA models trained on TACRED-E following the IS show an average 4% F1-score improvement, whereas reannotation (TACRED-R) does not improve the original results. However, following ES, SOTA models show the average F1-score improvement of 3.8% and 4.4% when trained on respective eliminated (TACRED-EN) and reannotated (TACRED-RN) datasets respectively. We further extended the ES for cleaning positive examples as well, which resulted in an average performance improvement of 5.8% and 5.6% for the eliminated (TACRED-ENP) and reannotated (TACRED-RNP) datasets respectively.

翻译：本文的研究目标具有双重性：其一，探索基于模型的方法表征关系抽取数据集TACRED中噪声的主要成因；其二，识别潜在的噪声实例。针对第一个目标，我们通过分析当前最先进模型的预测结果与性能表现，定位数据集噪声的根本来源。对TACRED的分析表明，该数据集中绝大多数噪声源于标注为无关系（no-relation）的负例样本。针对第二个目标，我们探索了两种基于最近邻策略的方法，自动识别潜在噪声样本以进行剔除或重标注。第一种策略（内禀策略IS）基于正例均干净的假设，利用假负预测结果识别含噪负例；第二种策略（外禀策略ES）则通过数据集中的干净子集识别潜在含噪负例。最终，我们在经剔除与重标注处理后的数据集上重新训练了最优模型。基于两个最优模型在经IS处理的TACRED-E上的实验结果显示，F1值平均提升4%，而重标注处理（TACRED-R）未能改善原始结果。采用ES策略时，最优模型在剔除数据集（TACRED-EN）与重标注数据集（TACRED-RN）上分别获得3.8%与4.4%的平均F1值提升。我们将ES策略进一步扩展至正例清洗，在剔除数据集（TACRED-ENP）与重标注数据集（TACRED-RNP）上分别取得5.8%与5.6%的平均性能提升。