Large language models (LLMs) have recently demonstrated promising performance in simultaneous machine translation (SimulMT). However, applying decoder-only LLMs to SimulMT introduces a positional mismatch, which leads to a dilemma between decoding efficiency and positional consistency. Existing approaches often rely on specific positional encodings or carefully designed prompting schemes, and thus fail to simultaneously achieve inference efficiency, positional consistency, and broad model compatibility. In this work, we propose ExPosST, a general framework that resolves this dilemma through explicit position allocation. ExPosST reserves fixed positional slots for incoming source tokens, enabling efficient decoding with KV cache across different positional encoding methods. To further bridge the gap between fine-tuning and inference, we introduce a policy-consistent fine-tuning strategy that aligns training with inference-time decoding behavior. Experiments across multiple language pairs demonstrate that ExPosST effectively supports simultaneous translation under diverse policies.
翻译:大语言模型近年来在同声传译任务中展现了良好的性能。然而,将仅解码器架构的大语言模型应用于同声传译会引入位置错位问题,导致解码效率与位置一致性之间存在两难困境。现有方法通常依赖特定的位置编码或精心设计的提示方案,因而无法同时实现推理效率、位置一致性和广泛模型兼容性。本文提出ExPosST通用框架,通过显式位置分配化解上述困境。该框架为输入源语言标记保留固定位置槽位,使不同位置编码方法均能通过KV缓存实现高效解码。为进一步缩小微调与推理之间的差距,我们引入策略一致性的微调策略,使训练过程与推理时的解码行为保持一致。跨多语言对的实验表明,ExPosST能够在多种策略下有效支持同声传译任务。