Large language models (LLMs) have recently demonstrated promising performance in simultaneous machine translation (SimulMT). However, applying decoder-only LLMs to SimulMT introduces a positional mismatch, which leads to a dilemma between decoding efficiency and positional consistency. Existing approaches often rely on specific positional encodings or carefully designed prompting schemes, and thus fail to simultaneously achieve inference efficiency, positional consistency, and broad model compatibility. In this work, we propose ExPosST, a general framework that resolves this dilemma through explicit position allocation. ExPosST reserves fixed positional slots for incoming source tokens, enabling efficient decoding with KV cache across different positional encoding methods. To further bridge the gap between fine-tuning and inference, we introduce a policy-consistent fine-tuning strategy that aligns training with inference-time decoding behavior. Experiments across multiple language pairs demonstrate that ExPosST effectively supports simultaneous translation under diverse policies.
翻译:大语言模型(LLMs)近期在同声机器翻译(SimulMT)任务中展现出有前景的性能。然而,将仅解码器架构的LLMs应用于SimulMT时会产生位置失配问题,导致解码效率与位置一致性之间的两难困境。现有方法通常依赖于特定的位置编码或精心设计的提示方案,因而难以同时实现推理效率、位置一致性以及广泛的模型兼容性。本研究提出ExPosST,一种通过显式位置分配来解决该困境的通用框架。ExPosST为输入源词元预留固定的位置槽,使得在不同位置编码方法下均可利用KV缓存实现高效解码。为进一步弥合微调与推理之间的差距,我们引入了策略一致的微调策略,使训练过程与推理时的解码行为对齐。跨多语言对的实验表明,ExPosST能有效支持多种策略下的同声翻译。