Large Language Models (LLMs) have achieved remarkable strides in multilingual translation but are hindered by a systemic cross-lingual verbosity bias, rendering them unsuitable for strict time-constrained tasks like subtitling and dubbing. Current prompt-engineering approaches struggle to resolve this conflict between semantic fidelity and rigid temporal feasibility. To bridge this gap, we first introduce Sand-Glass, a benchmark specifically designed to evaluate translation under syllable-level duration constraints. Furthermore, we propose HOMURA, a reinforcement learning framework that explicitly optimizes the trade-off between semantic preservation and temporal compliance. By employing a KL-regularized objective with a novel dynamic syllable-ratio reward, HOMURA effectively "tames" the output length. Experimental results demonstrate that our method significantly outperforms strong LLM baselines, achieving precise length control that respects linguistic density hierarchies without compromising semantic adequacy.
翻译:大型语言模型(LLMs)在多语言翻译方面取得了显著进展,但受限于系统性的跨语言冗长偏差,使其不适用于字幕翻译和配音等严格时间受限的任务。当前的提示工程方法难以解决语义保真度与严格时间可行性之间的冲突。为弥合这一差距,我们首先引入了Sand-Glass基准,该基准专门用于评估音节级时长约束下的翻译质量。此外,我们提出了HOMURA——一个通过强化学习显式优化语义保持与时间合规性权衡的框架。通过采用结合新颖动态音节比例奖励的KL正则化目标,HOMURA能有效"驯服"输出长度。实验结果表明,我们的方法显著优于现有强LLM基线,在尊重语言密度层级的同时实现了精确的长度控制,且未损害语义完整性。