With the development of deep learning, automatic speech recognition (ASR) has made significant progress. To further enhance the performance, revising recognition results is one of the lightweight but efficient manners. Various methods can be roughly classified into N-best reranking methods and error correction models. The former aims to select the hypothesis with the lowest error rate from a set of candidates generated by ASR for a given input speech. The latter focuses on detecting recognition errors in a given hypothesis and correcting these errors to obtain an enhanced result. However, we observe that these studies are hardly comparable to each other as they are usually evaluated on different corpora, paired with different ASR models, and even use different datasets to train the models. Accordingly, we first concentrate on releasing an ASR hypothesis revising (HypR) dataset in this study. HypR contains several commonly used corpora (AISHELL-1, TED-LIUM 2, and LibriSpeech) and provides 50 recognition hypotheses for each speech utterance. The checkpoint models of the ASR are also published. In addition, we implement and compare several classic and representative methods, showing the recent research progress in revising speech recognition results. We hope the publicly available HypR dataset can become a reference benchmark for subsequent research and promote the school of research to an advanced level.
翻译:随着深度学习的发展,自动语音识别(ASR)取得了显著进展。为进一步提升性能,修正识别结果是一种轻量级但高效的方法。现有方法可大致分为N最佳重排序方法和纠错模型两类。前者旨在从给定输入语音的ASR候选假设集合中选取错误率最低的假设,后者则聚焦于检测给定假设中的识别错误并加以修正以获得增强结果。然而,我们注意到这些研究难以相互比较,因为它们通常在不同语料库上评估、搭配不同ASR模型,甚至使用不同数据集训练模型。因此,本研究首先致力于发布ASR假设修正(HypR)数据集。HypR包含多个常用语料库(AISHELL-1、TED-LIUM 2和LibriSpeech),为每条语音话语提供50个识别假设,并同时公开ASR的检查点模型。此外,我们实现并比较了多种经典及代表性方法,展示了修正语音识别结果的最新研究进展。希望公开的HypR数据集能成为后续研究的参考基准,并推动该研究方向迈向更高水平。