Text Style Transfer Back-Translation

Back Translation (BT) is widely used in the field of machine translation, as it has been proved effective for enhancing translation quality. However, BT mainly improves the translation of inputs that share a similar style (to be more specific, translation-like inputs), since the source side of BT data is machine-translated. For natural inputs, BT brings only slight improvements and sometimes even adverse effects. To address this issue, we propose Text Style Transfer Back Translation (TST BT), which uses a style transfer model to modify the source side of BT data. By making the style of source-side text more natural, we aim to improve the translation of natural inputs. Our experiments on various language pairs, including both high-resource and low-resource ones, demonstrate that TST BT significantly improves translation performance against popular BT benchmarks. In addition, TST BT is proved to be effective in domain adaptation so this strategy can be regarded as a general data augmentation method. Our training code and text style transfer model are open-sourced.

翻译：反向翻译（Back Translation, BT）在机器翻译领域被广泛采用，因其已被证明能有效提升翻译质量。然而，反向翻译主要改善与训练数据风格相似（具体而言，类似机器翻译输入）的译入文本质量，这是因为反向翻译数据的源端由机器生成。对于自然语言输入，反向翻译仅能带来轻微改进，甚至有时会产生负面影响。为解决这一问题，我们提出文本风格迁移反向翻译（Text Style Transfer Back Translation, TST BT），该方法利用风格迁移模型修改反向翻译数据的源端。通过使源端文本风格更趋自然，我们旨在提升自然语言输入的翻译质量。我们在涵盖高资源与低资源语言对的多种语言组合上开展实验，结果表明TST BT相较于主流反向翻译基准方法显著提升了翻译性能。此外，TST BT在领域适应中也展现出有效性，因此该策略可被视为一种通用数据增强方法。我们的训练代码及文本风格迁移模型已开源。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日