Question answering (QA) is one of the most common NLP tasks that relates to named entity recognition, fact extraction, semantic search and some other fields. In industry, it is much appreciated in chatbots and corporate information systems. It is also a challenging task that attracted the attention of a very general audience at the quiz show Jeopardy! In this article we describe a Jeopardy!-like Russian QA data set collected from the official Russian quiz database Chgk (che ge ka). The data set includes 379,284 quiz-like questions with 29,375 from the Russian analogue of Jeopardy! - "Own Game". We observe its linguistic features and the related QA-task. We conclude about perspectives of a QA competition based on the data set collected from this database.
翻译:问答系统是自然语言处理中最常见的任务之一,涉及命名实体识别、事实提取、语义搜索等多个领域。在工业界,问答系统在聊天机器人和企业信息系统中备受重视。它也是一项具有挑战性的任务,曾通过智力竞赛节目《危险边缘》引起广泛公众关注。本文描述了一个基于俄语官方知识竞赛数据库Chgk(че ге ка)构建的类《危险边缘》俄语问答数据集。该数据集包含379,284条竞赛式问题,其中29,375条来自俄罗斯版《危险边缘》节目"Своя игра"。我们分析了该数据集的语言特征及相关问答任务,并基于此数据库收集的数据探讨了举办问答竞赛的前景。