Automatic speech recognition (ASR) correction has traditionally focused on isolated utterances or short local contexts. However, as text and speech become increasingly interleaved in long interactions, ASR correction requires conversation-level contextual evidence. Existing ASR correction methods often rely on the current hypothesis or concatenate raw dialogue history. In such contexts, sparse correction evidence can be difficult to locate amid redundancy and noise. Addressing these challenges, we propose an ontology memory-augmented ASR correction framework for long text-speech interleaved conversations. The framework organizes preceding interaction history into a dynamically updatable ontology memory, where entities, terminology, surface variants, potential ASR confusions, and semantic relations are stored as retrievable nodes for context-grounded correction. To evaluate this setting, we construct RAMC-Corr, a dataset derived from MAGIC-RAMC for long-range ASR correction with grounded context. Experiments on RAMC-Corr show that our method improves over direct correction in 9 out of 10 paired backbone-setting combinations and encourages more selective and evidence-grounded corrections for context-dependent ASR errors.
翻译:自动语音识别(ASR)纠正传统上侧重于孤立语句或短距离局部上下文。然而,随着文本与语音在长交互过程中日益交错出现,ASR纠正需要基于对话级别的上下文证据。现有ASR纠正方法通常依赖当前假设或拼接原始对话历史。在此类场景中,稀疏的纠正证据容易因冗余和噪声而难以定位。针对这些挑战,我们提出了一种面向长文本-语音交错对话的本体记忆增强式ASR纠正框架。该框架将先前的交互历史组织为可动态更新的本体记忆,其中实体、术语、表面变体、潜在ASR混淆项及语义关系均存储为可检索节点,用于基于上下文的纠正。为评估该设定,我们构建了RAMC-Corr数据集,其源自MAGIC-RAMC,专门用于长距离ASR纠正中的上下文物证利用。在RAMC-Corr上的实验表明,我们的方法在10组配对骨干-设置组合中的9组上优于直接纠正,并促进了针对上下文依赖型ASR错误更具选择性和证据根基的纠正。