MMEAD, or MS MARCO Entity Annotations and Disambiguations, is a resource for entity links for the MS MARCO datasets. We specify a format to store and share links for both document and passage collections of MS MARCO. Following this specification, we release entity links to Wikipedia for documents and passages in both MS MARCO collections (v1 and v2). Entity links have been produced by the REL and BLINK systems. MMEAD is an easy-to-install Python package, allowing users to load the link data and entity embeddings effortlessly. Using MMEAD takes only a few lines of code. Finally, we show how MMEAD can be used for IR research that uses entity information. We show how to improve recall@1000 and MRR@10 on more complex queries on the MS MARCO v1 passage dataset by using this resource. We also demonstrate how entity expansions can be used for interactive search applications.
翻译:MMEAD(MS MARCO实体标注与消歧资源)是针对MS MARCO数据集实体链接任务构建的基础资源。我们定义了标准格式,用于存储和共享MS MARCO文档与段落集合中的实体链接信息。遵循该规范,我们发布了MS MARCO两版数据集(v1与v2)中文档和段落到维基百科的实体链接。这些实体链接由REL和BLINK系统生成。MMEAD以易安装的Python包形式发布,使用户能够便捷地加载链接数据与实体嵌入。仅需数行代码即可调用MMEAD。最后,我们展示了该资源在基于实体信息的信息检索研究中的应用:通过利用该资源,在MS MARCO v1段落数据集上对复杂查询实现了召回率@1000与MRR@10的显著提升,并演示了如何将实体扩展应用于交互式搜索场景。