Hotword customization is one of the important issues remained in ASR field - it is of value to enable users of ASR systems to customize names of entities, persons and other phrases. The past few years have seen both implicit and explicit modeling strategies for ASR contextualization developed. While these approaches have performed adequately, they still exhibit certain shortcomings such as instability in effectiveness. In this paper we propose Semantic-augmented Contextual-Paraformer (SeACo-Paraformer) a novel NAR based ASR system with flexible and effective hotword customization ability. It combines the accuracy of the AED-based model, the efficiency of the NAR model, and the excellent performance in contextualization. In 50,000 hours industrial big data experiments, our proposed model outperforms strong baselines in customization and general ASR tasks. Besides, we explore an efficient way to filter large scale incoming hotwords for further improvement. The source codes and industrial models proposed and compared are all opened as well as two hotword test sets.
翻译:热词定制是语音识别领域的重要课题之一——赋予ASR系统用户自定义实体名称、人名及其他短语的能力具有重要价值。近年来,针对ASR上下文化的隐式与显式建模策略均有所发展。尽管这些方法表现尚可,但仍存在效果不稳定等缺陷。本文提出语义增强型上下文Paraformer(SeACo-Paraformer),这是一种新型基于NAR的ASR系统,具备灵活高效的热词定制能力。该系统融合了基于AED模型的准确性、NAR模型的高效性以及上下文化方面的优异性能。在5万小时工业大数据实验中,所提模型在定制化任务与通用ASR任务上均显著优于强基线系统。此外,我们探索了一种高效的大规模热词过滤方法以进一步提升性能。本文提出并进行对比的源代码与工业模型,以及两组热词测试集均已开源。