Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation

Prior work on English monolingual retrieval has shown that a cross-encoder trained using a large number of relevance judgments for query-document pairs can be used as a teacher to train more efficient, but similarly effective, dual-encoder student models. Applying a similar knowledge distillation approach to training an efficient dual-encoder model for Cross-Language Information Retrieval (CLIR), where queries and documents are in different languages, is challenging due to the lack of a sufficiently large training collection when the query and document languages differ. The state of the art for CLIR thus relies on translating queries, documents, or both from the large English MS MARCO training set, an approach called Translate-Train. This paper proposes an alternative, Translate-Distill, in which knowledge distillation from either a monolingual cross-encoder or a CLIR cross-encoder is used to train a dual-encoder CLIR student model. This richer design space enables the teacher model to perform inference in an optimized setting, while training the student model directly for CLIR. Trained models and artifacts are publicly available on Huggingface.

翻译：先前关于英语单语检索的研究表明，使用大量查询-文档对的相关性判断训练得到的交叉编码器，可以作为教师模型来训练更高效但效果相近的双编码器学生模型。将类似的知识蒸馏方法应用于跨语言信息检索（CLIR）中训练高效双编码器模型时，由于查询语言与文档语言不同时缺乏足够大的训练集，这一方法面临挑战。当前CLIR的先进技术依赖于从大规模英语MS MARCO训练集翻译查询、文档或两者，即"翻译-训练"方法。本文提出另一种替代方案"翻译-蒸馏"，该方法利用单语交叉编码器或CLIR交叉编码器的知识蒸馏来训练双编码器CLIR学生模型。这种更丰富的设计空间使得教师模型能在优化设置下进行推理，同时直接针对CLIR训练学生模型。训练好的模型与制品已在Huggingface上公开提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日