Neural sequence-to-sequence systems deliver state-of-the-art performance for automatic speech recognition. When using appropriate modeling units, e.g., byte-pair encoded characters, these systems are in principal open vocabulary systems. In practice, however, they often fail to recognize words not seen during training, e.g., named entities, acronyms, or domain-specific special words. To address this problem, many context biasing methods have been proposed; however, for words with a pronunciation-orthography mismatch, these methods may still struggle. We propose a method which allows corrections of substitution errors to improve the recognition accuracy of such challenging words. Users can add corrections on the fly during inference. We show that with this method we get a relative improvement in biased word error rate of up to 8%, while maintaining a competitive overall word error rate.
翻译:神经序列到序列系统为自动语音识别提供了最先进的性能。当使用适当的建模单元(例如字节对编码字符)时,这些系统在原理上是开放词汇系统。然而,在实践中,它们往往无法识别训练期间未见过的单词,例如命名实体、首字母缩略词或特定领域的特殊词汇。针对这一问题,许多语境偏置方法被提出;然而,对于发音与拼写不匹配的单词,这些方法可能仍然存在困难。我们提出了一种方法,允许通过纠正替换错误来提高此类挑战性单词的识别准确率。用户可以在推理过程中动态添加纠正。我们证明,使用该方法,偏置词错误率可获得高达8%的相对提升,同时保持具有竞争力的整体词错误率。