Automatic speech recognition (ASR) systems generate real-time transcriptions but often miss nuances that human interpreters capture. While ASR is useful in many contexts, interpreters-who already use ASR tools such as Dragon-add critical value, especially in sensitive settings such as diplomatic meetings where subtle language is key. Human interpreters not only perceive these nuances but can adjust in real time, improving accuracy, while ASR handles basic transcription tasks. However, ASR systems introduce a delay that does not align with real-time interpretation needs. The user-perceived latency of ASR systems differs from that of interpretation because it measures the time between speech and transcription delivery. To address this, we propose a new approach to measuring delay in ASR systems and validate if they are usable in live interpretation scenarios.
翻译:自动语音识别(ASR)系统能够生成实时转录文本,但往往无法捕捉人工口译员所能把握的细微语义差别。尽管ASR在许多场景中具有实用价值,但已在使用Dragon等ASR工具的口译员仍能提供关键性增值服务——这在以微妙语言表达为核心的外交会议等敏感场合尤为突出。人工口译员不仅能感知这些细微差别,还能实时调整策略以提升准确度,而ASR主要承担基础转录任务。然而,ASR系统存在的延迟问题与实时口译需求并不匹配。用户感知的ASR系统延迟与口译延迟存在本质差异,因其测量的是从语音输入到转录文本输出的时间间隔。为此,我们提出一种测量ASR系统延迟的新方法,并验证其在实时口译场景中的可用性。