Machine interpreting (MI), the live, real-time branch of speech translation, has achieved remarkable progress on standard benchmarks, with some systems approaching human parity on textual fidelity. Yet the user experience remains far inferior to interpreter-mediated communication, revealing what we term the \emph{accuracy illusion}: systems that appear accurate on paper but fail in practice to support smooth, goal-oriented interaction. This paper defines MI as a distinct subfield of speech translation, with its own characteristics and the need for evaluation methods grounded in communicative effectiveness rather than isolated fidelity metrics. Drawing on insights from interpreting studies, we identify critical dimensions of professional interpreting practice that are overlooked by current systems, and consolidate them into three interdependent design priorities for future MI: \emph{agency} (context-sensitive initiative and repair), \emph{grounding} (multimodal and discourse-level situational awareness), and \emph{experience} (adaptive improvement through real interaction). Together, these priorities chart a path toward closing the usability gap and enabling systems that can sustain authentic multilingual communication in real time.
翻译:机器口译(MI)作为语音翻译的实时分支,在标准基准测试中已取得显著进展,部分系统在文本忠实度上接近人类水平。然而,用户体验仍远逊于人工口译交流——这揭示了我们所谓的"准确度幻觉":即那些纸面上表现精准却在实践中无法支持顺畅、目标导向交互的系统。本文将机器口译界定为语音翻译的独立子领域,阐明其特有属性,并呼吁建立以交际有效性而非孤立忠实度指标为基础的评估方法。通过吸收口译研究的洞见,我们识别出现有系统忽视的若干专业口译实践关键维度,并将其凝练为未来机器口译设计的三大相互依存的优先方向:**能动性**(情境敏感的主动干预与修正)、**具身化**(多模态与语篇层面的情境感知)与**体验性**(通过真实交互实现适应性改进)。这三大方向共同勾勒出弥合可用性差距的路径,助力构建能实时维系真实多语言交流的下一代系统。