In spite of the excellent strides made by end-to-end (E2E) models in speech recognition in recent years, named entity recognition is still challenging but critical for semantic understanding. In order to enhance the ability to recognize named entities in E2E models, previous studies mainly focus on various rule-based or attention-based contextual biasing algorithms. However, their performance might be sensitive to the biasing weight or degraded by excessive attention to the named entity list, along with a risk of false triggering. Inspired by the success of the class-based language model (LM) in named entity recognition in conventional hybrid systems and the effective decoupling of acoustic and linguistic information in the factorized neural Transducer (FNT), we propose a novel E2E model to incorporate class-based LMs into FNT, which is referred as C-FNT. In C-FNT, the language model score of named entities can be associated with the name class instead of its surface form. The experimental results show that our proposed C-FNT presents significant error reduction in named entities without hurting performance in general word recognition.
翻译:尽管近年来端到端(E2E)模型在语音识别领域取得了显著进展,但命名实体识别对于语义理解仍具有挑战性且至关重要。为增强E2E模型识别命名实体的能力,现有研究主要聚焦于基于规则或基于注意力的上下文偏置算法。然而,这类方法的性能可能对偏置权重敏感,或因过度关注命名实体列表导致性能退化,同时存在虚假触发风险。受传统混合系统中基于类别的语言模型(LM)在命名实体识别领域的成功经验,以及因式分解神经换能器(FNT)有效解耦声学与语言信息的启发,我们提出一种新型E2E模型——C-FNT,将基于类别的语言模型融入FNT。在C-FNT中,命名实体的语言模型分数可关联其名称类别而非表层形式。实验结果表明,我们提出的C-FNT在命名实体识别中实现了显著的错误率降低,同时未影响通用词汇识别的性能。