With the rise of social media, users are exposed to many misleading claims. However, the pervasive noise inherent in these posts presents a challenge in identifying precise and prominent claims that require verification. Extracting the important claims from such posts is arduous and time-consuming, yet it is an underexplored problem. Here, we aim to bridge this gap. We introduce a novel task, Claim Normalization (aka ClaimNorm), which aims to decompose complex and noisy social media posts into more straightforward and understandable forms, termed normalized claims. We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation, mimicking human reasoning processes, to comprehend intricate claims. Moreover, we capitalize on the in-context learning capabilities of large language models to provide guidance and to improve claim normalization. To evaluate the effectiveness of our proposed model, we meticulously compile a comprehensive real-world dataset, CLAN, comprising more than 6k instances of social media posts alongside their respective normalized claims. Our experiments demonstrate that CACN outperforms several baselines across various evaluation measures. Finally, our rigorous error analysis validates CACN's capabilities and pitfalls.
翻译:随着社交媒体的兴起,用户接触到大量具有误导性的声明。然而,这些帖子中普遍存在的噪声给识别需要核查的精确且突出的声明带来了挑战。从中提取重要声明既费力又耗时,但这一问题目前尚未得到充分探索。本文旨在填补这一空白。我们提出了一项新任务——声明归一化(ClaimNorm),旨在将复杂且充满噪声的社交媒体帖子分解为更直接易懂的形式,即归一化声明。我们提出了CACN这一开创性方法,利用思维链和声明可核查性估计,模拟人类推理过程,以理解复杂声明。此外,我们利用大型语言模型的上下文学习能力提供指导,改进声明归一化。为评估所提模型的有效性,我们精心构建了包含6000多条社交媒体帖子及其对应归一化声明的真实世界数据集CLAN。实验表明,CACN在多种评估指标上均优于多个基线模型。最后,我们通过严格的错误分析验证了CACN的能力与局限性。