Energy-based language models (ELMs) parameterize an unnormalized distribution for natural sentences and are radically different from popular autoregressive language models (ALMs). As an important application, ELMs have been successfully used as a means for calculating sentence scores in speech recognition, but they all use less-modern CNN or LSTM networks. The recent progress in Transformer networks and large pretrained models such as BERT and GPT2 opens new possibility to further advancing ELMs. In this paper, we explore different architectures of energy functions and different training methods to investigate the capabilities of ELMs in rescoring for speech recognition, all using large pretrained models as backbones.
翻译:基于能量的语言模型(ELMs)通过对自然语句的非归一化分布进行参数化,与流行的自回归语言模型(ALMs)有本质区别。作为重要应用,ELMs 已被成功用于语音识别中计算句子得分,但均采用较陈旧的 CNN 或 LSTM 网络。Transformer 网络以及 BERT、GPT2 等大型预训练模型的最新进展,为进一步推动 ELMs 发展提供了新可能。本文以大型预训练模型为骨干,探索了不同能量函数架构和不同训练方法,以研究 ELMs 在语音识别重评分任务中的能力。