End-to-end automatic speech recognition (E2E ASR) systems often suffer from mistranscription of domain-specific phrases, such as named entities, sometimes leading to catastrophic failures in downstream tasks. A family of fast and lightweight named entity correction (NEC) models for ASR have recently been proposed, which normally build on phonetic-level edit distance algorithms and have shown impressive NEC performance. However, as the named entity (NE) list grows, the problems of phonetic confusion in the NE list are exacerbated; for example, homophone ambiguities increase substantially. In view of this, we proposed a novel Description Augmented Named entity CorrEctoR (dubbed DANCER), which leverages entity descriptions to provide additional information to facilitate mitigation of phonetic confusion for NEC on ASR transcription. To this end, an efficient entity description augmented masked language model (EDA-MLM) comprised of a dense retrieval model is introduced, enabling MLM to adapt swiftly to domain-specific entities for the NEC task. A series of experiments conducted on the AISHELL-1 and Homophone datasets confirm the effectiveness of our modeling approach. DANCER outperforms a strong baseline, the phonetic edit-distance-based NEC model (PED-NEC), by a character error rate (CER) reduction of about 7% relatively on AISHELL-1 for named entities. More notably, when tested on Homophone that contain named entities of high phonetic confusion, DANCER offers a more pronounced CER reduction of 46% relatively over PED-NEC for named entities.
翻译:端到端自动语音识别(E2E ASR)系统常存在领域特定短语(如命名实体)的误转录问题,有时会导致下游任务出现灾难性错误。近期提出了一类用于ASR的快速轻量级命名实体校正(NEC)模型,该类模型通常基于音素级编辑距离算法,并展现出优异的NEC性能。然而,随着命名实体(NE)列表规模增长,NE列表中的音素混淆问题愈发严重,例如同音词歧义性显著增加。鉴于此,我们提出了一种新颖的描述增强命名实体校正器(简称DANCER),该模型利用实体描述提供额外信息,以缓解ASR转录中NE校正面临的音素混淆问题。为此,我们引入了一种由稠密检索模型组成的高效实体描述增强掩码语言模型(EDA-MLM),使MLM能够快速适应NEC任务中的领域特定实体。在AISHELL-1和同音词数据集上开展的一系列实验验证了我们建模方法的有效性。与基于音素编辑距离的强基线NEC模型(PED-NEC)相比,DANCER在AISHELL-1数据集上针对命名实体实现了约7%的相对字符错误率(CER)降低。更值得注意的是,在包含高音素混淆度命名实体的同音词数据集上测试时,DANCER对命名实体的CER相对PED-NEC实现了高达46%的显著降低。