Learning multiscale Transformer models has been evidenced as a viable approach to augmenting machine translation systems. Prior research has primarily focused on treating subwords as basic units in developing such systems. However, the incorporation of fine-grained character-level features into multiscale Transformer has not yet been explored. In this work, we present a \textbf{S}low-\textbf{F}ast two-stream learning model, referred to as Tran\textbf{SF}ormer, which utilizes a ``slow'' branch to deal with subword sequences and a ``fast'' branch to deal with longer character sequences. This model is efficient since the fast branch is very lightweight by reducing the model width, and yet provides useful fine-grained features for the slow branch. Our TranSFormer shows consistent BLEU improvements (larger than 1 BLEU point) on several machine translation benchmarks.
翻译:学习多尺度Transformer模型已被证实是增强机器翻译系统的有效方法。先前研究主要将子词作为基本单元来开发此类系统。然而,将细粒度的字符级特征融入多尺度Transformer尚未被探索。本文提出一种**慢-快**双流学习模型,称为Tran**SF**ormer,其利用“慢”分支处理子词序列,“快”分支处理更长的字符序列。该模型高效性体现在:快分支通过缩减模型宽度实现极轻量化,同时为慢分支提供有用的细粒度特征。我们的TranSFormer在多个机器翻译基准测试中均实现了一致性的BLEU值提升(超过1个BLEU点)。