We introduce the Efficient Monotonic Multihead Attention (EMMA), a state-of-the-art simultaneous translation model with numerically-stable and unbiased monotonic alignment estimation. In addition, we present improved training and inference strategies, including simultaneous fine-tuning from an offline translation model and reduction of monotonic alignment variance. The experimental results demonstrate that the proposed model attains state-of-the-art performance in simultaneous speech-to-text translation on the Spanish and English translation task.
翻译:我们提出高效单调多头注意力(EMMA),这是一种具有数值稳定且无偏单调对齐估计的先进同声翻译模型。此外,我们提出了改进的训练与推理策略,包括基于离线翻译模型的同步微调以及降低单调对齐方差。实验结果表明,该模型在西班牙语到英语的同声语音翻译任务上达到了最优性能。