Entity recognition in Automatic Speech Recognition (ASR) is challenging for rare and domain-specific terms. In domains such as finance, medicine, and air traffic control, these errors are costly. If the entities are entirely absent from the ASR output, post-ASR correction becomes difficult. To address this, we introduce RECOVER, an agentic correction framework that serves as a tool-using agent. It leverages multiple hypotheses as evidence from ASR, retrieves relevant entities, and applies Large Language Model (LLM) correction under constraints. The hypotheses are used using different strategies, namely, 1-Best, Entity-Aware Select, Recognizer Output Voting Error Reduction (ROVER) Ensemble, and LLM-Select. Evaluated across five diverse datasets, it achieves 8-46% relative reductions in entity-phrase word error rate (E-WER) and increases recall by up to 22 percentage points. The LLM-Select achieves the best overall performance in entity correction while maintaining overall WER.
翻译:自动语音识别(ASR)中的实体识别对于罕见词和领域专有术语具有挑战性。在金融、医疗和空中交通管制等领域,此类识别错误会导致高昂代价。若实体完全未出现在ASR输出中,后置ASR校正将变得尤为困难。为此,我们提出RECOVER——一种作为工具调用智能体的校正框架。该框架利用ASR生成的多种假设作为证据,检索相关实体,并在约束条件下应用大语言模型(LLM)进行校正。假设的生成采用四种策略:1-Best、实体感知选择、识别器输出投票误差缩减(ROVER)集成以及LLM选择。在五个异构数据集上的评估表明,该框架使实体短语词错误率(E-WER)相对降低8-46%,召回率最高提升22个百分点。其中LLM选择策略在保持整体词错误率的同时,实现了最优的实体校正综合性能。