Online propaganda poses a severe threat to the integrity of societies. However, existing datasets for detecting online propaganda have a key limitation: they were annotated using weak labels that can be noisy and even incorrect. To address this limitation, our work makes the following contributions: (1) We present HQP: a novel dataset (N=30,000) for detecting online propaganda with high-quality labels. To the best of our knowledge, HQP is the first dataset for detecting online propaganda that was created through human annotation. (2) We show empirically that state-of-the-art language models fail in detecting online propaganda when trained with weak labels (AUC: 64.03). In contrast, state-of-the-art language models can accurately detect online propaganda when trained with our high-quality labels (AUC: 92.25), which is an improvement of ~44%. (3) To address the cost of labeling, we extend our work to few-shot learning. Specifically, we show that prompt-based learning using a small sample of high-quality labels can still achieve a reasonable performance (AUC: 80.27). Finally, we discuss implications for the NLP community to balance the cost and quality of labeling. Crucially, our work highlights the importance of high-quality labels for sensitive NLP tasks such as propaganda detection.
翻译:在线宣传对社会的完整性构成了严重威胁。然而,现有用于检测在线宣传的数据集存在关键局限:其标注采用弱标签方式,这些标签可能存在噪声甚至错误。为解决此问题,我们的工作做出以下贡献:(1) 提出HQP——一个包含3万条高质量标签的新数据集,用于检测在线宣传。据我们所知,HQP是首个通过人工标注创建的在线宣传检测数据集。(2) 通过实验证明,当使用弱标签训练时,最先进语言模型在检测在线宣传时效果不佳(AUC: 64.03)。相比之下,使用我们的高质量标签训练时,最先进语言模型能够准确检测在线宣传(AUC: 92.25),性能提升约44%。(3) 为解决标注成本问题,我们将工作扩展到少样本学习场景。具体而言,我们证明基于提示的学习方法仅需少量高质量标签样本即可获得合理性能(AUC: 80.27)。最后,我们探讨了NLP领域在平衡标注成本与质量方面的启示。关键的是,我们的工作凸显了对于宣传检测等敏感NLP任务而言,高质量标签至关重要。