Existing benchmarks for open-domain question answering (ODQA) typically focus on questions whose answers can be extracted from a single paragraph. By contrast, many natural questions, such as "What players were drafted by the Brooklyn Nets?" have a list of answers. Answering such questions requires retrieving and reading from many passages, in a large corpus. We introduce QAMPARI, an ODQA benchmark, where question answers are lists of entities, spread across many paragraphs. We created QAMPARI by (a) generating questions with multiple answers from Wikipedia's knowledge graph and tables, (b) automatically pairing answers with supporting evidence in Wikipedia paragraphs, and (c) manually paraphrasing questions and validating each answer. We train ODQA models from the retrieve-and-read family and find that QAMPARI is challenging in terms of both passage retrieval and answer generation, reaching an F1 score of 32.8 at best. Our results highlight the need for developing ODQA models that handle a broad range of question types, including single and multi-answer questions.
翻译:现有开放域问答(ODQA)基准通常聚焦于能从单个段落提取答案的问题。然而,许多自然出现的问题(例如"布鲁克林篮网队选秀了哪些球员?")存在一组答案列表。回答此类问题需要在大型语料库中检索并阅读多个段落。我们提出QAMPARI——一个ODQA基准,其问题答案为分布于多个段落的实体列表。我们通过以下步骤构建QAMPARI:(a)基于维基百科知识图谱与表格生成含多答案的问题;(b)自动将答案与维基百科段落中的支撑证据配对;(c)人工改写问题并验证每个答案。我们训练了检索-阅读型ODQA模型,发现QAMPARI在段落检索与答案生成两方面均具有挑战性:最佳F1分数仅达32.8。结果凸显了开发能处理广泛问题类型(包括单答案与多答案问题)的ODQA模型的必要性。