The Transformer model has revolutionized Natural Language Processing tasks such as Neural Machine Translation, and many efforts have been made to study the Transformer architecture, which increased its efficiency and accuracy. One potential area for improvement is to address the computation of empty tokens that the Transformer computes only to discard them later, leading to an unnecessary computational burden. To tackle this, we propose an algorithm that sorts translation sentence pairs based on their length before batching, minimizing the waste of computing power. Since the amount of sorting could violate the independent and identically distributed (i.i.d) data assumption, we sort the data partially. In experiments, we apply the proposed method to English-Korean and English-Luganda language pairs for machine translation and show that there are gains in computational time while maintaining the performance. Our method is independent of architectures, so that it can be easily integrated into any training process with flexible data lengths.
翻译:Transformer模型彻底革新了神经机器翻译等自然语言处理任务,学界在提升其效率与准确性的架构研究上已投入大量努力。当前一个可优化的方向是解决空令牌计算问题——Transformer在计算后仅将空令牌丢弃,造成不必要的计算负担。为此,我们提出一种基于句子长度进行排序的算法,在批处理前对翻译句对进行排序,从而最大限度减少算力浪费。由于排序程度可能违反独立同分布(i.i.d)数据假设,我们采用部分排序策略。实验表明,将该方法应用于英韩与英卢干达语言对的机器翻译中,在保持模型性能的同时实现了计算时间增益。该方法具有架构无关性,可灵活适配任意变长数据训练流程。