Machine interpreting (MI), the live, real-time application of speech translation, has achieved remarkable progress on standard benchmarks, with some systems approaching human parity on textual fidelity. Yet the user experience remains far inferior to interpreter-mediated communication, revealing what we term the accuracy illusion: systems that appear accurate on paper but fail in practice to support smooth, goal-oriented interaction. This paper defines MI as a distinct subfield of speech translation, with its own characteristics and the need for evaluation methods grounded in communicative effectiveness rather than isolated fidelity metrics. Drawing on insights from interpreting studies, we identify critical dimensions of professional interpreting practice that are overlooked by current systems, and consolidate them into three interdependent design priorities for future MI: agency (context-sensitive initiative and repair), grounding (multimodal and discourse-level situational awareness), and experience (adaptive improvement through real interaction). Together, these priorities chart a path toward closing the usability gap and enabling systems that can sustain authentic multilingual communication in real time.
翻译:机器口译(MI)——语音翻译的实时应用——已在标准基准测试中取得显著进展,部分系统在文本保真度方面接近人类水平。然而,用户体验仍远逊于人工口译交流,揭示了本文所称的“准确性幻觉”:那些在纸面上表现良好、但在实际中无法支持顺畅、目标导向互动的系统。本文将MI定义为语音翻译的一个独特子领域,具有自身特性,并且需要基于沟通有效性而非孤立保真度指标的评估方法。借鉴口译研究的洞见,我们识别了当前系统所忽视的专业口译实践的关键维度,并将其整合为未来MI的三个相互依存的设计优先级:能动性(情境敏感的主动行为与修复)、根基性(多模态及话语层面的态势感知)与体验性(通过真实互动的自适应改进)。这些优先级共同勾勒出弥合可用性鸿沟、实现系统在实时中维持真正多语言交流的路径。