In this paper, we investigate cross-lingual learning (CLL) for multilingual scene text recognition (STR). CLL transfers knowledge from one language to another. We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages. To do so, we first examine if two general insights about CLL discussed in previous works are applied to multilingual STR: (1) Joint learning with high- and low-resource languages may reduce performance on low-resource languages, and (2) CLL works best between typologically similar languages. Through extensive experiments, we show that two general insights may not be applied to multilingual STR. After that, we show that the crucial condition for CLL is the dataset size of high-resource languages regardless of the kind of high-resource languages. Our code, data, and models are available at https://github.com/ku21fan/CLL-STR.
翻译:本文研究了多语种场景文字识别中的跨语言学习。跨语言学习将知识从一种语言迁移至另一种语言,旨在探索利用高资源语言知识提升低资源语言性能的条件。为此,我们首先验证了已有研究中关于跨语言学习的两个普遍观点是否适用于多语种场景文字识别:(1)与高、低资源语言联合学习可能降低低资源语言的性能;(2)跨语言学习在类型学相似的语言间效果最佳。通过大量实验,我们发现这两个普遍观点可能不适用于多语种场景文字识别。进一步研究表明,跨语言学习的关键条件在于高资源语言的数据集规模,而非高资源语言的类型。我们的代码、数据及模型已开源于 https://github.com/ku21fan/CLL-STR。