In the digital age, seeking health advice on the Internet has become a common practice. At the same time, determining the trustworthiness of online medical content is increasingly challenging. Fact-checking has emerged as an approach to assess the veracity of factual claims using evidence from credible knowledge sources. To help advance automated Natural Language Processing (NLP) solutions for this task, in this paper we introduce a novel dataset HealthFC. It consists of 750 health-related claims in German and English, labeled for veracity by medical experts and backed with evidence from systematic reviews and clinical trials. We provide an analysis of the dataset, highlighting its characteristics and challenges. The dataset can be used for NLP tasks related to automated fact-checking, such as evidence retrieval, claim verification, or explanation generation. For testing purposes, we provide baseline systems based on different approaches, examine their performance, and discuss the findings. We show that the dataset is a challenging test bed with a high potential for future use.
翻译:在数字时代,通过互联网获取健康建议已成为普遍做法。与此同时,判断在线医疗内容的可信度也日益具有挑战性。事实核查作为一种通过可信知识来源的证据评估事实主张真实性的方法应运而生。为推进该任务的自动化自然语言处理解决方案,本文提出一个名为HealthFC的新型数据集。该数据集包含750条德语和英语健康相关陈述,由医学专家标记真实性,并附有来自系统综述和临床试验的证据支持。我们对该数据集进行了分析,突出其特征与挑战。该数据集可用于自动化事实核查相关的自然语言处理任务,如证据检索、声明验证或解释生成。为测试目的,我们基于不同方法提供了基线系统,评估其性能并讨论研究发现。实验表明,该数据集是一个具有高未来应用潜力的挑战性测试平台。