This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims to adapt these architectures in a causal setup for training LLMs. Transformers decoders effectively capture long-range dependencies over several modalities and form a core backbone of modern advancements in machine learning. Convolutional architectures have been popular in extracting features in domains such as raw 1-D signals, speech, and images, to name a few. In this paper, by combining local and global dependencies over latent representations using causal convolutional filters and Transformer, we achieve significant gains in performance. This work showcases a robust speech architecture that can be integrated and adapted in a causal setup beyond speech applications for large-scale language modeling.
翻译:本工作将两种流行的神经架构模块——卷积层与Transformer——相结合,用于构建大语言模型(LLMs)。非因果Conformer在自动语音识别中被广泛应用。本研究旨在将这些架构适配至因果设定中,以训练大语言模型。Transformer解码器能够有效捕捉多种模态上的长程依赖关系,构成了现代机器学习进展的核心基础。卷积架构在原始一维信号、语音及图像等领域中,已成为特征提取的主流方法。本文通过利用因果卷积滤波器与Transformer在潜在表示上融合局部与全局依赖关系,实现了显著的性能提升。本工作展示了一种鲁棒的语音架构,其可被集成并适配至超越语音应用的因果设定中,用于大规模语言建模。