Hotword customization is one of the concerned issues remained in ASR field - it is of value to enable users of ASR systems to customize names of entities, persons and other phrases to obtain better experience. The past few years have seen effective modeling strategies for ASR contextualization developed, but they still exhibit space for improvement about training stability and the invisible activation process. In this paper we propose Semantic-Augmented Contextual-Paraformer (SeACo-Paraformer) a novel NAR based ASR system with flexible and effective hotword customization ability. It possesses the advantages of AED-based model's accuracy, NAR model's efficiency, and explicit customization capacity of superior performance. Through extensive experiments with 50,000 hours of industrial big data, our proposed model outperforms strong baselines in customization. Besides, we explore an efficient way to filter large-scale incoming hotwords for further improvement. The industrial models compared, source codes and two hotword test sets are all open source.
翻译:热词定制是语音识别领域仍备受关注的问题之一——使语音识别系统用户能够自定义实体名、人名及其他短语以获得更优体验具有重要价值。近年来,面向语音识别上下文化的有效建模策略已取得进展,但在训练稳定性与隐式激活过程方面仍有改进空间。本文提出语义增强上下文帕拉弗默(SeACo-Paraformer),一种基于非自回归的新型语音识别系统,具备灵活高效的热词定制能力。该系统兼具基于注意力编解码器模型的准确性、非自回归模型的高效性以及卓越的显式定制能力。通过基于5万小时工业大数据的广泛实验,所提模型在定制任务上优于强基线系统。此外,我们探索了一种高效的大规模热词过滤方法以进一步优化性能。所有对比的工业模型、源代码及两个热词测试集均已开源。