A lot of claims are made in social media posts, which may contain misinformation or fake news. Hence, it is crucial to identify claims as a first step towards claim verification. Given the huge number of social media posts, the task of identifying claims needs to be automated. This competition deals with the task of 'Claim Span Identification' in which, given a text, parts / spans that correspond to claims are to be identified. This task is more challenging than the traditional binary classification of text into claim or not-claim, and requires state-of-the-art methods in Pattern Recognition, Natural Language Processing and Machine Learning. For this competition, we used a newly developed dataset called HECSI containing about 8K posts in English and about 8K posts in Hindi with claim-spans marked by human annotators. This paper gives an overview of the competition, and the solutions developed by the participating teams.
翻译:社交媒体帖子中包含大量主张,其中可能含有错误信息或虚假新闻。因此,识别主张是进行主张验证的关键第一步。鉴于社交媒体帖子数量庞大,主张识别任务需要实现自动化。本次竞赛聚焦于“主张片段识别”任务,即给定一段文本,识别其中对应主张的部分/片段。该任务比传统的将文本二分类为主张或非主张更具挑战性,需要采用模式识别、自然语言处理和机器学习领域的前沿方法。本次竞赛采用新开发的数据集HECSI,其中包含约8,000条英语帖子和约8,000条印地语帖子,并由人工标注者标记了主张片段。本文概述了本次竞赛以及参赛团队提出的解决方案。