Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different layers, contexts and models. In this work, we explore a wide range of techniques to obtain and transfer multiple representations of LLMs into a transducer-based ASR system. While being conceptually simple, we show that transferring multiple representations of LLMs can be an effective alternative to transferring only a single representation.
翻译:将大型语言模型(LLM)的知识迁移至端到端自动语音识别(ASR)系统,是一种融入语言学知识的有效技术。然而,现有工作仅迁移LLM的单种表征(如预训练BERT的最后一层),而文本表征本质上是非唯一的,可从不同层、上下文和模型中获取。本研究探索了多种技术手段,用于获取并迁移LLM的多重表征至基于转导器的ASR系统。尽管概念上简单,我们证明迁移LLM的多重表征可成为仅迁移单种表征的有效替代方案。