We propose a novel neural network architecture based on conformer transducer that adds contextual information flow to the ASR systems. Our method improves the accuracy of recognizing uncommon words while not harming the word error rate of regular words. We explore the uncommon words accuracy improvement when we use the new model and/or shallow fusion with context language model. We found that combination of both provides cumulative gain in uncommon words recognition accuracy.
翻译:我们提出了一种基于Conformer Transducer的新型神经网络架构,该架构为自动语音识别系统增加了上下文信息流。该方法在提升罕见词识别准确率的同时,不会损害常规词的词错误率。我们探究了使用新模型和/或与上下文语言模型进行浅融合时罕见词准确率的提升情况。研究发现,两者的结合能够在罕见词识别准确率上带来累积性增益。