Simultaneous translation is a task in which translation begins before the speaker has finished speaking. In its evaluation, we have to consider the latency of the translation in addition to the quality. The latency is preferably as small as possible for users to comprehend what the speaker says with a small delay. Existing latency metrics focus on when the translation starts but do not consider adequately when the translation ends. This means such metrics do not penalize the latency caused by a long translation output, which actually delays users' comprehension. In this work, we propose a novel latency evaluation metric called Average Token Delay (ATD) that focuses on the end timings of partial translations in simultaneous translation. We discuss the advantage of ATD using simulated examples and also investigate the differences between ATD and Average Lagging with simultaneous translation experiments.
翻译:同声传译是在说话者讲话结束前即开始翻译的任务。在其评估中,除质量外还需考虑翻译的延迟情况。对用户而言,延迟应尽可能小,以便在较小延迟下理解说话者表述。现有延迟度量指标侧重于翻译开始时间,但未能充分考量翻译结束时间。这意味着此类指标不会惩罚由较长翻译输出所导致的延迟——而这一延迟实际上会阻碍用户的理解。本文提出一种名为平均令牌延迟(Average Token Delay, ATD)的新型延迟评估指标,该指标聚焦于同声传译中部分翻译的结束时间点。我们通过模拟示例论证了ATD的优势,并基于同声传译实验考察了ATD与平均滞后(Average Lagging)之间的差异。