Check-worthy claim detection aims at providing plausible misinformation to downstream fact-checking systems or human experts to check. This is a crucial step toward accelerating the fact-checking process. Many efforts have been put into how to identify check-worthy claims from a small scale of pre-collected claims, but how to efficiently detect check-worthy claims directly from a large-scale information source, such as Twitter, remains underexplored. To fill this gap, we introduce MythQA, a new multi-answer open-domain question answering(QA) task that involves contradictory stance mining for query-based large-scale check-worthy claim detection. The idea behind this is that contradictory claims are a strong indicator of misinformation that merits scrutiny by the appropriate authorities. To study this task, we construct TweetMythQA, an evaluation dataset containing 522 factoid multi-answer questions based on controversial topics. Each question is annotated with multiple answers. Moreover, we collect relevant tweets for each distinct answer, then classify them into three categories: "Supporting", "Refuting", and "Neutral". In total, we annotated 5.3K tweets. Contradictory evidence is collected for all answers in the dataset. Finally, we present a baseline system for MythQA and evaluate existing NLP models for each system component using the TweetMythQA dataset. We provide initial benchmarks and identify key challenges for future models to improve upon. Code and data are available at: https://github.com/TonyBY/Myth-QA
翻译:可核查声明检测旨在为下游事实核查系统或人类专家提供可能的虚假信息,这是加速事实核查流程的关键一步。目前已有大量研究聚焦于如何从预先收集的小规模声明中识别可核查声明,但如何直接从大规模信息源(如Twitter)高效检测可核查声明仍待深入探索。为填补这一空白,我们提出MythQA——一项新的多答案开放域问答任务,通过矛盾立场挖掘实现基于查询的大规模可核查声明检测。其核心理念在于:矛盾声明是虚假信息的有力指标,值得相关权威机构审查。为研究该任务,我们构建了TweetMythQA评估数据集,包含基于争议主题的522个事实性多答案问题,每个问题均标注了多个答案。此外,我们针对每个不同答案收集相关推文,并将其分为三类:"支持"、"反驳"与"中立"。总计标注了5.3K条推文,为数据集中所有答案收集了矛盾证据。最终,我们提出了MythQA的基线系统,并利用TweetMythQA数据集评估了各系统组件的现有NLP模型性能。我们提供了初步基准,并指出了未来模型需改进的关键挑战。代码与数据见:https://github.com/TonyBY/Myth-QA