We introduce a novel neural network module that adeptly handles recursive data flow in neural network architectures. At its core, this module employs a self-consistent approach where a set of recursive equations is solved iteratively, halting when the difference between two consecutive iterations falls below a defined threshold. Leveraging this mechanism, we construct a new neural network architecture, an extension of the conformer transducer, which enriches automatic speech recognition systems with a stream of contextual information. Our method notably improves the accuracy of recognizing rare words without adversely affecting the word error rate for common vocabulary. We investigate the improvement in accuracy for these uncommon words using our novel model, both independently and in conjunction with shallow fusion with a context language model. Our findings reveal that the combination of both approaches can improve the accuracy of detecting rare words by as much as 4.5 times. Our proposed self-consistent recursive methodology is versatile and adaptable, compatible with many recently developed encoders, and has the potential to drive model improvements in speech recognition and beyond.
翻译:本文提出一种新型神经网络模块,能够有效处理神经网络架构中的递归数据流。该模块的核心采用自一致方法,通过迭代求解一组递归方程,当连续两次迭代间的差异低于设定阈值时停止计算。基于此机制,我们构建了一种新的神经网络架构——作为conformer transducer的扩展,该架构通过引入上下文信息流增强了自动语音识别系统。我们的方法显著提高了罕见词汇的识别准确率,同时未对常用词汇的词错误率产生负面影响。我们通过该新型模型单独使用及与上下文语言模型进行浅层融合两种方式,系统研究了罕见词汇识别准确率的提升效果。实验结果表明,两种方法结合可将罕见词汇检测准确率提升至4.5倍。所提出的自一致递归方法具有通用性和适应性,可与多种最新开发的编码器兼容,有望推动语音识别及其他领域的模型性能提升。