Hotword customization is one of the important issues remained in ASR field - it is of value to enable users of ASR systems to customize names of entities, persons and other phrases. The past few years have seen both implicit and explicit modeling strategies for ASR contextualization developed. While these approaches have performed adequately, they still exhibit certain shortcomings, such as instability in effectiveness, especially in non-autoregressive ASR models. In this paper we propose Semantic-augmented Contextual-Paraformer (SeACo-Paraformer) a novel NAR based ASR system with flexible and effective hotword customization ability. It combines the accuracy of the AED-based model, the efficiency of the NAR model, and the excellent performance in contextualization. In tens of thousands of hours industrial big data experiments, our proposed model outperforms strong baselines in customization and general ASR tasks. Besides, we explore an efficient way to filter large scale incoming hotwords for further improvement.
翻译:热词定制是语音识别领域亟待解决的重要问题之一——使语音识别系统用户能够自定义实体名称、人名及其他短语具有重要价值。近年来,面向语音识别上下文化的隐式和显式建模策略相继得到发展。尽管这些方法表现尚可,但仍存在部分不足,例如在非自回归语音识别模型中效果不够稳定。本文提出语义增强的上下文感知Paraformer(SeACo-Paraformer),这是一种新颖的基于非自回归的语音识别系统,具备灵活高效的热词定制能力。该模型融合了基于注意力编码器-解码器模型的准确性、非自回归模型的高效性以及优异的上下文化表现。在数万小时的工业大数据实验中,我们提出的模型在定制化任务和通用语音识别任务上均优于强基线方法。此外,我们还探索了一种高效的大规模动态热词过滤方法以进一步提升性能。