With the development of deep learning, automatic speech recognition (ASR) has made significant progress. To further enhance the performance, revising recognition results is one of the lightweight but efficient manners. Various methods can be roughly classified into N-best reranking methods and error correction models. The former aims to select the hypothesis with the lowest error rate from a set of candidates generated by ASR for a given input speech. The latter focuses on detecting recognition errors in a given hypothesis and correcting these errors to obtain an enhanced result. However, we observe that these studies are hardly comparable to each other as they are usually evaluated on different corpora, paired with different ASR models, and even use different datasets to train the models. Accordingly, we first concentrate on releasing an ASR hypothesis revising (HypR) dataset in this study. HypR contains several commonly used corpora (AISHELL-1, TED-LIUM 2, and LibriSpeech) and provides 50 recognition hypotheses for each speech utterance. The checkpoint models of the ASR are also published. In addition, we implement and compare several classic and representative methods, showing the recent research progress in revising speech recognition results. We hope the publicly available HypR dataset can become a reference benchmark for subsequent research and promote the school of research to an advanced level.
翻译:随着深度学习的发展,自动语音识别(ASR)技术取得了显著进展。为进一步提升性能,识别结果修正是轻量级且高效的方法之一。现有方法大致可分为N-best重排序方法和纠错模型两类。前者旨在从给定输入语音的ASR候选假设集中选择错误率最低的假设,后者则专注于检测给定假设中的识别错误并修正这些错误以获取增强结果。然而,我们观察到这些研究难以相互比较,因为它们通常基于不同语料库评估、搭配不同ASR模型,甚至使用不同数据集训练模型。为此,本研究首先致力于发布一个ASR假设修正(HypR)数据集。HypR包含多个常用语料库(AISHELL-1、TED-LIUM 2和LibriSpeech),并为每条语音语句提供50个识别假设,同时公开ASR的检查点模型。此外,我们实现并比较了多种经典与代表性方法,展示了语音识别结果修正领域的最新研究进展。我们期望公开的HypR数据集能成为后续研究的参考基准,推动该研究方向迈向更高水平。