Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rapidly evolving field of natural language processing. Current innovations, including State Space Models, were initially celebrated for surpassing Transformer performance on language modeling tasks. However, these models have revealed deficiencies in essential In-Context Learning capabilities - a domain where the Transformer traditionally shines. The Based model emerged as a hybrid solution, blending a Linear Transformer with a kernel inspired by the Taylor expansion of exponential functions, augmented by convolutional networks. Mirroring the Transformer's in-context adeptness, it became a strong contender in the field. In our work, we present a singular, elegant alteration to the Based kernel that amplifies its In-Context Learning abilities evaluated with the Multi-Query Associative Recall task and overall language modeling process, as demonstrated on the Pile dataset.
翻译:推动语言模型(LM)次二次架构的前沿发展,在自然语言处理这一快速演变的领域中至关重要。当前的创新成果,包括状态空间模型,最初因在语言建模任务上超越Transformer性能而备受赞誉。然而,这些模型在关键的上下文学习能力(这是Transformer传统上擅长的领域)方面暴露出不足。Based模型作为一种混合解决方案应运而生,它将线性Transformer与受指数函数泰勒展开启发的核函数相结合,并辅以卷积网络。该模型模仿了Transformer在上下文学习方面的能力,成为该领域的强有力竞争者。在本工作中,我们对Based核函数提出了一项简洁而优雅的改进,从而增强了其在多查询关联记忆任务和整体语言建模过程中的上下文学习能力,这在Pile数据集上得到了验证。