Weakly supervised learning is a popular approach for training machine learning models in low-resource settings. Instead of requesting high-quality yet costly human annotations, it allows training models with noisy annotations obtained from various weak sources. Recently, many sophisticated approaches have been proposed for robust training under label noise, reporting impressive results. In this paper, we revisit the setup of these approaches and find that the benefits brought by these approaches are significantly overestimated. Specifically, we find that the success of existing weakly supervised learning approaches heavily relies on the availability of clean validation samples which, as we show, can be leveraged much more efficiently by simply training on them. After using these clean labels in training, the advantages of using these sophisticated approaches are mostly wiped out. This remains true even when reducing the size of the available clean data to just five samples per class, making these approaches impractical. To understand the true value of weakly supervised learning, we thoroughly analyze diverse NLP datasets and tasks to ascertain when and why weakly supervised approaches work. Based on our findings, we provide recommendations for future research.
翻译:弱监督学习是一种在低资源环境下训练机器学习模型的流行方法。它无需耗费高昂成本获取高质量的人工标注,而是允许使用从各种弱监督源获得的含噪标注来训练模型。近年来,针对标签噪声下的鲁棒训练,学界提出了许多复杂方法并报道了令人印象深刻的结果。本文重新审视了这些方法的设定,发现它们所带来的益处被显著高估。具体而言,我们发现现有弱监督学习方法成功的关键在于依赖可用的干净验证样本——而我们的研究表明,仅需直接在训练中利用这些样本,就能更高效地发挥其作用。在训练过程中使用这些干净标签后,采用复杂方法所带来的优势几乎消失殆尽。即便将可用干净数据量减少至每类仅五个样本,这一结论依然成立,这使得这些方法在实践中缺乏实用性。为理解弱监督学习的真正价值,我们全面分析了多种自然语言处理数据集和任务,以明确弱监督方法何时及为何有效。基于研究发现,我们为未来研究提供了建议。