One of the core pillars of efficient deep learning methods is architectural improvements such as the residual/skip connection, which has led to significantly better model convergence and quality. Since then the residual connection has become ubiquitous in not just convolutional neural networks but also transformer-based architectures, the backbone of LLMs. In this paper we introduce \emph{Learned Augmented Residual Layer} (LAuReL) -- a novel generalization of the canonical residual connection -- with the goal to be an in-situ replacement of the latter while outperforming on both model quality and footprint metrics. Our experiments show that using \laurel can help boost performance for both vision and language models. For example, on the ResNet-50, ImageNet 1K task, it achieves $60\%$ of the gains from adding an extra layer, while only adding $0.003\%$ more parameters, and matches it while adding $2.6\times$ fewer parameters.
翻译:高效深度学习方法的核心支柱之一是架构改进,例如残差/跳跃连接,它显著提升了模型收敛性和质量。自此,残差连接不仅广泛应用于卷积神经网络,也已成为基于Transformer架构(大型语言模型的骨干)的普遍组件。本文提出一种新颖的残差连接泛化方法——学习型增强残差层,旨在原位替代经典残差连接,同时在模型质量和参数量指标上实现更优表现。实验表明,采用LAuReL能够提升视觉和语言模型的性能。例如,在ResNet-50的ImageNet 1K任务中,该方法仅增加0.003%参数量的情况下即可实现增加额外层级所获增益的60%,而在参数量减少2.6倍的条件下仍能匹配其性能。