We present our systems participated in the VLSP 2022 machine translation shared task. In the shared task this year, we participated in both translation tasks, i.e., Chinese-Vietnamese and Vietnamese-Chinese translations. We build our systems based on the neural-based Transformer model with the powerful multilingual denoising pre-trained model mBART. The systems are enhanced by a sampling method for backtranslation, which leverage large scale available monolingual data. Additionally, several other methods are applied to improve the translation quality including ensembling and postprocessing. We achieve 38.9 BLEU on ChineseVietnamese and 38.0 BLEU on VietnameseChinese on the public test sets, which outperform several strong baselines.
翻译:本文介绍了我们参加 VLSP 2022 机器翻译共享任务的系统。在本年度的共享任务中,我们同时参与了两项翻译任务,即中文-越南语和越南语-中文翻译。我们基于神经Transformer模型构建系统,并采用强大的多语言去噪预训练模型mBART。系统通过一种反向翻译的采样方法增强,该方法充分利用了大规模可用单语数据。此外,我们还应用了包括集成和后处理在内的多种其他方法来提升翻译质量。在公开测试集上,我们的系统在中文-越南语方向达到38.9 BLEU,在越南语-中文方向达到38.0 BLEU,超越了多个强基线系统。