Large language models (LLMs) have achieved state-of-the-art performance in various language processing tasks, motivating their adoption in simultaneous translation. Current fine-tuning methods to adapt LLMs for simultaneous translation focus on prompting optimization strategies using either data augmentation or prompt structure modifications. However, these methods suffer from several issues, such as unnecessarily expanded training sets, computational inefficiency from dumping the key and value cache, increased prompt sizes, or restriction to a single decision policy. To eliminate these issues, in this work, we propose SimulMask, a new paradigm for fine-tuning LLMs for simultaneous translation. It utilizes a novel attention mask approach that models simultaneous translation during fine-tuning by masking attention for a desired decision policy. Applying the proposed SimulMask on a Falcon LLM for the IWSLT 2017 dataset, we have observed a significant translation quality improvement compared to state-of-the-art prompting optimization strategies on five language pairs while reducing the computational cost.
翻译:大型语言模型(LLMs)已在多种语言处理任务中取得最先进的性能,这推动了其应用于同声传译领域。当前为适应同声传译任务而对LLMs进行微调的方法,主要聚焦于通过数据增强或提示结构修改的提示优化策略。然而,这些方法存在若干问题,例如不必要地扩大训练集、因丢弃键值缓存导致的计算效率低下、提示规模增大,或受限于单一决策策略。为消除这些问题,本研究提出SimulMask,一种用于微调LLMs以适应同声传译的新范式。该方法采用一种新颖的注意力掩码机制,通过在微调过程中根据期望的决策策略对注意力进行掩码,从而建模同声传译过程。将所提出的SimulMask应用于Falcon LLM并在IWSLT 2017数据集上进行实验,我们观察到在五种语言对上的翻译质量相比最先进的提示优化策略有显著提升,同时降低了计算成本。