Text image machine translation (TIMT) has been widely used in various real-world applications, which translates source language texts in images into another target language sentence. Existing methods on TIMT are mainly divided into two categories: the recognition-then-translation pipeline model and the end-to-end model. However, how to transfer knowledge from the pipeline model into the end-to-end model remains an unsolved problem. In this paper, we propose a novel Multi-Teacher Knowledge Distillation (MTKD) method to effectively distillate knowledge into the end-to-end TIMT model from the pipeline model. Specifically, three teachers are utilized to improve the performance of the end-to-end TIMT model. The image encoder in the end-to-end TIMT model is optimized with the knowledge distillation guidance from the recognition teacher encoder, while the sequential encoder and decoder are improved by transferring knowledge from the translation sequential and decoder teacher models. Furthermore, both token and sentence-level knowledge distillations are incorporated to better boost the translation performance. Extensive experimental results show that our proposed MTKD effectively improves the text image translation performance and outperforms existing end-to-end and pipeline models with fewer parameters and less decoding time, illustrating that MTKD can take advantage of both pipeline and end-to-end models.
翻译:文本图像机器翻译(TIMT)已在多种实际应用中得到广泛使用,其任务是将图像中的源语言文本翻译成另一种目标语言的句子。现有的TIMT方法主要分为两类:先识别后翻译的流水线模型和端到端模型。然而,如何将流水线模型中的知识迁移到端到端模型中仍是一个未解决的问题。本文提出一种新颖的多教师知识蒸馏(MTKD)方法,以有效地将流水线模型中的知识蒸馏至端到端TIMT模型中。具体而言,我们利用三个教师模型来提升端到端TIMT模型的性能。端到端TIMT模型中的图像编码器通过识别教师编码器的知识蒸馏指导进行优化,而序列编码器和解码器则通过从翻译教师序列编码器和解码器模型中迁移知识加以改进。此外,我们还融入了词级和句子级的知识蒸馏,以进一步增强翻译性能。大量实验结果表明,我们提出的MTKD方法有效提升了文本图像翻译性能,并以更少的参数和解码时间优于现有的端到端模型和流水线模型,表明MTKD能够兼具流水线模型与端到端模型的优势。