The use of propagandistic techniques in online communication has increased in recent years, aiming to manipulate online audiences. Efforts to automatically detect and debunk such content have been made, addressing various modeling scenarios. These include determining whether the content (text, image, or multimodal) (i) is propagandistic, (ii) employs one or more techniques, and (iii) includes techniques with identifiable spans. Significant research efforts have been devoted to the first two scenarios compared to the latter. Therefore, in this study, we focus on the task of detecting propagandistic textual spans. We investigate whether large language models such as GPT-4 can be utilized to perform the task of an annotator. For the experiments, we used an in-house developed dataset consisting of annotations from multiple annotators. Our results suggest that providing more information to the model as prompts improves the annotation agreement and performance compared to human annotations. We plan to make the annotated labels from multiple annotators, including GPT-4, available for the community.
翻译:近年来,在线交流中的宣传技巧使用日益增多,旨在操纵网络受众。为自动检测并揭露此类内容,研究者已针对多种建模场景展开工作,包括判断内容(文本、图像或多模态)是否(i)具有宣传性,(ii)采用一种或多种技巧,以及(iii)包含可识别跨度的技巧。相较于后两种场景,前两个场景已投入了大量研究。因此,本研究聚焦于检测宣传性文本跨度的任务。我们探究了GPT-4等大型语言模型能否用于执行标注员的任务。实验中,我们使用了一个内部开发的、包含多名标注员标注结果的数据集。结果表明,与人工标注相比,向模型提供更多信息作为提示可提升标注一致性和性能。我们计划将包含GPT-4在内的多名标注员生成的标注标签公开供社区使用。