This article focuses on large language models (LLMs) fine-tuning in the scarce data regime (also known as the "few-shot" learning setting). We propose a method to increase the generalization capabilities of LLMs based on neural network subspaces. This optimization method, recently introduced in computer vision, aims to improve model generalization by identifying wider local optima through the joint optimization of an entire simplex of models in parameter space. Its adaptation to massive, pretrained transformers, however, poses some challenges. First, their considerable number of parameters makes it difficult to train several models jointly, and second, their deterministic parameter initialization schemes make them unfit for the subspace method as originally proposed. We show in this paper that "Parameter Efficient Fine-Tuning" (PEFT) methods, however, are perfectly compatible with this original approach, and propose to learn entire simplex of continuous prefixes. We test our method on a variant of the GLUE benchmark adapted to the few-shot learning setting, and show that both our contributions jointly lead to a gain in average performances compared to sota methods. The implementation can be found at the following link: https://github.com/Liloulou/prefix_subspace
翻译:本文关注大型语言模型(LLMs)在数据稀缺场景(即“少样本”学习设定)下的微调问题。我们提出了一种基于神经网络子空间的方法来增强LLMs的泛化能力。该优化方法近期在计算机视觉领域引入,旨在通过参数空间中整个模型单形体的联合优化来识别更宽的局部最优解,从而提升模型泛化性能。然而,该方法对大规模预训练Transformer的适配存在若干挑战:首先,这些模型参数数量庞大使得联合训练多个模型变得困难;其次,其确定性的参数初始化方案导致模型不适用于原始提出的子空间方法。本文证明,“参数高效微调”(PEFT)方法与这一原始方法完美兼容,并提出了学习连续前缀单形体的方案。我们在适配少样本学习设定的GLUE基准变体上测试了该方法,结果表明我们的两项贡献共同带来了相比当前最优方法的平均性能提升。代码实现可通过以下链接获取:https://github.com/Liloulou/prefix_subspace