With the development of deep learning, automatic speech recognition (ASR) has made significant progress. To further enhance the performance of ASR, revising recognition results is one of the lightweight but efficient manners. Various methods can be roughly classified into N-best reranking modeling and error correction modeling. The former aims to select the hypothesis with the lowest error rate from a set of candidates generated by ASR for a given input speech. The latter focuses on detecting recognition errors in a given hypothesis and correcting these errors to obtain an enhanced result. However, we observe that these studies are hardly comparable to each other, as they are usually evaluated on different corpora, paired with different ASR models, and even use different datasets to train the models. Accordingly, we first concentrate on providing an ASR hypothesis revising (HypR) dataset in this study. HypR contains several commonly used corpora (AISHELL-1, TED-LIUM 2, and LibriSpeech) and provides 50 recognition hypotheses for each speech utterance. The checkpoint models of ASR are also published. In addition, we implement and compare several classic and representative methods, showing the recent research progress in revising speech recognition results. We hope that the publicly available HypR dataset can become a reference benchmark for subsequent research and promote this field of research to an advanced level.
翻译:随着深度学习的发展,自动语音识别(ASR)已取得显著进展。为进⼀步提升ASR性能,对识别结果进⾏修正是⼀种轻量且有效的⽅式。各类⽅法可⼤致分为N-best重排序建模与错误修正建模两类。前者旨在从ASR为给定输⼊语⾳⽣成的候选集中选择错误率最低的假设;后者则专注于检测给定假设中的识别错误并予以修正,以获得增强结果。然⽽,我们观察到现有研究因通常使⽤不同语料库进⾏评估、搭配不同ASR模型,甚⾄采⽤不同数据集训练模型,导致相互间难以直接⽐较。为此,本研究⾸先致⼒于构建ASR假设修正(HypR)数据集。HypR包含多个常⽤语料库(AISHELL-1、TED-LIUM 2和LibriSpeech),并为每条语⾳语段提供50个识别假设,同时公开ASR的检查点模型。此外,我们实现并⽐较了若⼲经典代表性⽅法,展⽰了语⾳识别结果修正领域的最新研究进展。我们希望公开可⽤的HypR数据集能为后续研究提供基准参考,推动该领域研究⾛向更⾼⽔平。