Different entities with the same name can be difficult to distinguish. Handling confusing entity mentions is a crucial skill for language models (LMs). For example, given the question "Where was Michael Jordan educated?" and a set of documents discussing different people named Michael Jordan, can LMs distinguish entity mentions to generate a cohesive answer to the question? To test this ability, we introduce a new benchmark, AmbigDocs. By leveraging Wikipedia's disambiguation pages, we identify a set of documents, belonging to different entities who share an ambiguous name. From these documents, we generate questions containing an ambiguous name and their corresponding sets of answers. Our analysis reveals that current state-of-the-art models often yield ambiguous answers or incorrectly merge information belonging to different entities. We establish an ontology categorizing four types of incomplete answers and automatic evaluation metrics to identify such categories. We lay the foundation for future work on reasoning across multiple documents with ambiguous entities.
翻译:同名不同实体往往难以区分。处理混淆的实体提及是语言模型(LMs)的一项关键能力。例如,给定问题“迈克尔·乔丹在哪里接受教育?”以及一组讨论不同名为迈克尔·乔丹的人物的文档,语言模型能否区分实体提及,从而生成一个连贯的答案?为测试此能力,我们引入了一个新的基准测试——AmbigDocs。通过利用维基百科的消歧页面,我们识别出一组文档,这些文档属于共享一个模糊名称的不同实体。基于这些文档,我们生成了包含模糊名称的问题及其对应的答案集合。我们的分析表明,当前最先进的模型常常产生模糊的答案,或错误地合并属于不同实体的信息。我们建立了一个本体,将不完整答案分为四类,并制定了自动评估指标以识别这些类别。我们为未来在包含模糊实体的多文档推理研究奠定了基础。