C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Andrew Rouditchenko,Yung-Sung Chuang,Nina Shvetsova,Samuel Thomas,Rogerio Feris,Brian Kingsbury,Leonid Karlinsky,David Harwath,Hilde Kuehne,James Glass

from arxiv, Accepted at ICASSP 2023. The code, models, and dataset are available at https://github.com/roudimit/c2kd

Multilingual text-video retrieval methods have improved significantly in recent years, but the performance for other languages lags behind English. We propose a Cross-Lingual Cross-Modal Knowledge Distillation method to improve multilingual text-video retrieval. Inspired by the fact that English text-video retrieval outperforms other languages, we train a student model using input text in different languages to match the cross-modal predictions from teacher models using input text in English. We propose a cross entropy based objective which forces the distribution over the student's text-video similarity scores to be similar to those of the teacher models. We introduce a new multilingual video dataset, Multi-YouCook2, by translating the English captions in the YouCook2 video dataset to 8 other languages. Our method improves multilingual text-video retrieval performance on Multi-YouCook2 and several other datasets such as Multi-MSRVTT and VATEX. We also conducted an analysis on the effectiveness of different multilingual text models as teachers. The code, models, and dataset are available at https://github.com/roudimit/c2kd.

翻译：多语言文本-视频检索方法近年来取得了显著进展，但非英语语言的性能仍落后于英语。我们提出了一种跨语言跨模态知识蒸馏方法，旨在提升多语言文本-视频检索的性能。受英语文本-视频检索表现优于其他语言这一事实启发，我们训练了一个学生模型，使用不同语言的输入文本，使其与基于英语输入文本的教师模型的跨模态预测保持一致。我们提出了一种基于交叉熵的目标函数，迫使学生模型的文本-视频相似度分数分布与教师模型相似。通过将YouCook2视频数据集中的英文字幕翻译成其他8种语言，我们构建了一个新的多语言视频数据集Multi-YouCook2。我们的方法在Multi-YouCook2以及Multi-MSRVTT和VATEX等多个数据集上提升了多语言文本-视频检索的性能。我们还对不同多语言文本模型作为教师的有效性进行了分析。相关代码、模型及数据集已开源至https://github.com/roudimit/c2kd。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日