Existing benchmarks for open-domain question answering (ODQA) typically focus on questions whose answers can be extracted from a single paragraph. By contrast, many natural questions, such as "What players were drafted by the Brooklyn Nets?" have a list of answers. Answering such questions requires retrieving and reading from many passages, in a large corpus. We introduce QAMPARI, an ODQA benchmark, where question answers are lists of entities, spread across many paragraphs. We created QAMPARI by (a) generating questions with multiple answers from Wikipedia's knowledge graph and tables, (b) automatically pairing answers with supporting evidence in Wikipedia paragraphs, and (c) manually paraphrasing questions and validating each answer. We train ODQA models from the retrieve-and-read family and find that QAMPARI is challenging in terms of both passage retrieval and answer generation, reaching an F1 score of 32.8 at best. Our results highlight the need for developing ODQA models that handle a broad range of question types, including single and multi-answer questions.
翻译:现有开放域问答(ODQA)基准测试通常聚焦于答案可从单一段落中提取的问题。然而,许多自然语言问题(如"布鲁克林篮网队选中了哪些球员?")包含多个答案的列表。回答此类问题需要从大规模语料库中检索并阅读多个段落。我们提出QAMPARI——一个答案由跨多个段落分布的实体列表构成的ODQA基准。构建过程包括:(a)基于维基百科知识图谱和表格生成含多个答案的问题;(b)自动将答案与维基百科段落中的支撑证据配对;(c)人工复述问题并验证每个答案。我们训练了检索-阅读范式的ODQA模型,发现QAMPARI在段落检索与答案生成两个环节均具有挑战性,最佳F1分数仅为32.8。实验结果凸显了开发能处理包括单答案与多答案问题在内的广泛问题类型的ODQA模型的必要性。