Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints, which traditional policies with only READ/WRITE actions cannot fully address. We extend the action space of SiMT with four adaptive actions: Sentence_Cut, Drop, Partial_Summarization and Pronominalization, which enable real-time restructuring, omission, and simplification while preserving semantic fidelity. We adapt these actions in a large language model (LLM) framework and construct training references through action-aware prompting. To evaluate both quality and word-level monotonicity, we further develop a latency-aware TTS pipeline that maps textual outputs to speech with realistic timing. Experiments on the ACL60/60 English-Chinese, English-German and English-Japanese benchmarks show that our framework consistently improves semantic metrics and achieves lower delay compared to reference translations and salami-based baselines. Notably, combining Drop and Sentence_Cut leads to consistent improvements in the balance between fluency and latency. These results demonstrate that enriching the action space of LLM-based SiMT provides a promising direction for bridging the gap between human and machine interpretation.
翻译:同声机器翻译(SiMT)需要在严格的实时约束下实现高质量翻译,而传统仅包含读取/写入动作的策略无法完全解决这一问题。我们通过四种自适应动作扩展了SiMT的动作空间:句子切分、省略、部分概括和代词化,这些动作能在保持语义保真度的同时实现实时结构调整、内容省略和表达简化。我们在大型语言模型(LLM)框架中适配这些动作,并通过动作感知提示构建训练参考。为同步评估翻译质量和词级单调性,我们进一步开发了延迟感知的TTS流水线,将文本输出映射为具有真实时序的语音。在ACL60/60英汉、英德和英日基准测试上的实验表明,我们的框架在语义指标上持续提升,且相比参考译文和基于分段处理的基线系统实现了更低延迟。值得注意的是,省略与句子切分动作的组合在流畅度与延迟的平衡方面带来了持续改进。这些结果表明,丰富基于LLM的SiMT动作空间为弥合人机传译差距提供了可行方向。