This paper introduces a method for adapting LoRA adapters in smaller-sized language models to arbitrary downstream tasks. Unlike standard mixture-of-expert architectures, our method employs a gradient-free routing function to choose a weighted combination of experts without increasing the compute requirements for training or inference. The results show that token-level adaptation of LoRA adapters outperforms the base Llama-2-7b model across mathematical (GSM8K), scientific (ARC-Challenge), reading comprehension (SQuAD), and coding (CodeAlpaca-20k) tasks. Further evaluations also show that the average performance of token-level adaptation outperforms individual models fine-tuned for each of the tasks with the best performance observed in adaptation of every-other token during inference. The code for this study is made available through a public repository.
翻译:本文提出一种方法,用于在较小规模语言模型中实现对LoRA适配器进行任意下游任务的适配。与标准混合专家架构不同,该方法采用无需梯度的路由函数选择专家的加权组合,且不增加训练或推理的计算需求。结果表明,LoRA适配器的词元级适配在数学(GSM8K)、科学(ARC-Challenge)、阅读理解(SQuAD)和代码生成(CodeAlpaca-20k)任务上均优于基础Llama-2-7b模型。进一步评估显示,词元级适配的平均性能超过针对各任务单独微调的模型,其中在推理时每隔一个词元进行适配取得了最佳表现。本研究的代码已通过公共仓库公开。