We present KGConv, a large, conversational corpus of 71k conversations where each question-answer pair is grounded in a Wikidata fact. Conversations contain on average 8.6 questions and for each Wikidata fact, we provide multiple variants (12 on average) of the corresponding question using templates, human annotations, hand-crafted rules and a question rewriting neural model. We provide baselines for the task of Knowledge-Based, Conversational Question Generation. KGConv can further be used for other generation and analysis tasks such as single-turn question generation from Wikidata triples, question rewriting, question answering from conversation or from knowledge graphs and quiz generation.
翻译:摘要:我们提出了KGConv,一个包含71,000个对话的大型对话语料库,其中每个问答对都以Wikidata事实为基础。对话平均包含8.6个问题,针对每个Wikidata事实,我们通过模板、人工标注、手工规则以及问题重写神经模型提供了多个问题变体(平均12个)。我们为基于知识的对话问题生成任务提供了基准方法。KGConv还可用于其他生成和分析任务,例如从Wikidata三元组生成单轮问题、问题重写、从对话或知识图谱中进行问答以及测验生成。