When answering natural language questions over knowledge bases, missing facts, incomplete schema and limited scope naturally lead to many questions being unanswerable. While answerability has been explored in other QA settings, it has not been studied for QA over knowledge bases (KBQA). We create GrailQAbility, a new benchmark KBQA dataset with unanswerability, by first identifying various forms of KB incompleteness that make questions unanswerable, and then systematically adapting GrailQA (a popular KBQA dataset with only answerable questions). Experimenting with three state-of-the-art KBQA models, we find that all three models suffer a drop in performance even after suitable adaptation for unanswerable questions. In addition, these often detect unanswerability for wrong reasons and find specific forms of unanswerability particularly difficult to handle. This underscores the need for further research in making KBQA systems robust to unanswerability
翻译:在基于知识库的自然语言问答中,缺失事实、不完整的模式及有限的范围自然会导致许多问题无法回答。尽管可回答性已在其他问答场景中得到探索,但尚未在基于知识库的问答(KBQA)中进行研究。我们通过首先识别导致问题无法回答的知识库不完整性的各种形式,然后系统性地调整GrailQA(一个仅包含可回答问题的流行KBQA数据集),创建了名为GrailQAbility的新基准KBQA数据集(涵盖不可回答性问题)。通过使用三种最先进的KBQA模型进行实验,我们发现即使针对不可回答性问题进行适当调整后,所有三种模型的性能仍有所下降。此外,这些模型常常因错误原因检测到不可回答性,并且发现特定形式的不可回答性特别难以处理。这凸显了进一步研究以使KBQA系统对不可回答性具有鲁棒性的必要性。