Higher Layers Need More LoRA Experts - 专知论文

会员服务 ·

0

LORA · 层 · Performer · MoDELS · tuning ·

2024 年 2 月 13 日

Higher Layers Need More LoRA Experts

翻译：高层需要更多LoRA专家

Chongyang Gao,Kezhen Chen,Jinmeng Rao,Baochen Sun,Ruibo Liu,Daiyi Peng,Yawen Zhang,Xiaoyuan Guo,Jie Yang,VS Subrahmanian

from arxiv, The code is available at https://github.com/GCYZSL/MoLA

Parameter-efficient tuning (PEFT) techniques like low-rank adaptation (LoRA) offer training efficiency on Large Language Models, but their impact on model performance remains limited. Recent efforts integrate LoRA and Mixture-of-Experts (MoE) to improve the performance of PEFT methods. Despite promising results, research on improving the efficiency of LoRA with MoE is still in its early stages. Recent studies have shown that experts in the MoE architecture have different strengths and also exhibit some redundancy. Does this statement also apply to parameter-efficient MoE? In this paper, we introduce a novel parameter-efficient MoE method, \textit{\textbf{M}oE-L\textbf{o}RA with \textbf{L}ayer-wise Expert \textbf{A}llocation (MoLA)} for Transformer-based models, where each model layer has the flexibility to employ a varying number of LoRA experts. We investigate several architectures with varying layer-wise expert configurations. Experiments on six well-known NLP and commonsense QA benchmarks demonstrate that MoLA achieves equal or superior performance compared to all baselines. We find that allocating more LoRA experts to higher layers further enhances the effectiveness of models with a certain number of experts in total. With much fewer parameters, this allocation strategy outperforms the setting with the same number of experts in every layer. This work can be widely used as a plug-and-play parameter-efficient tuning approach for various applications. The code is available at https://github.com/GCYZSL/MoLA.

翻译：参数高效微调技术（如低秩适应LoRA）能提升大语言模型的训练效率，但其对模型性能的影响仍有限。近期研究尝试融合LoRA与混合专家模型（MoE）以改进参数高效微调方法的性能。尽管已有初步成果，关于通过MoE提升LoRA效率的研究仍处于早期阶段。最新研究表明，MoE架构中的专家具有差异化优势，同时也存在冗余现象。这一结论是否适用于参数高效型MoE？本文针对基于Transformer的模型，提出了一种新型参数高效MoE方法——\textit{\textbf{逐层专家分配的MoE-LoRA（MoLA）}}，该方法允许各模型层灵活配置不同数量的LoRA专家。我们探索了多种具有不同逐层专家配置的架构。在六个知名NLP与常识问答基准上的实验表明，MoLA能实现与所有基线方法相当或更优的性能。研究发现，在固定专家总数的情况下，为高层分配更多LoRA专家可进一步强化模型效果。与每层设置相同数量专家的方案相比，这种分配策略能以更少的参数取得更优结果。本工作可作为即插即用的参数高效微调方法广泛应用于各类任务。代码已开源在https://github.com/GCYZSL/MoLA。

0

相关内容

LORA

Graph Transformer近期进展

Graph Transformer近期进展

专知会员服务

65+阅读 · 2023年1月5日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

第七届全国数学文化论坛

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

DMB信号水汽探测方法若干问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

MEMS数字地震检波器专用DSP芯片优化设计

国家自然科学基金

1+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

哈密尔顿系统及多体问题的周期解

国家自然科学基金

0+阅读 · 2014年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

Featurizing Koopman Mode Decomposition

Arxiv

0+阅读 · 2024年3月24日

Debiased Multivariable Mendelian Randomization

Arxiv

0+阅读 · 2024年3月23日

Computationally Scalable Bayesian SPDE Modeling for Censored Spatial Responses

Arxiv

0+阅读 · 2024年3月23日

A Survey on Multimodal Large Language Models

Arxiv

25+阅读 · 2023年6月23日

Engagement Decision Support for Beyond Visual Range Air Combat

Engagement Decision Support for Beyond Visual Range Air Combat

Arxiv

66+阅读 · 2021年11月4日

Hierarchical Graph Capsule Network

Hierarchical Graph Capsule Network

Arxiv

20+阅读 · 2020年12月16日

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Arxiv

13+阅读 · 2020年7月3日

Geometric Graph Convolutional Neural Networks

Geometric Graph Convolutional Neural Networks

Arxiv

10+阅读 · 2019年9月11日

Embedding Uncertain Knowledge Graphs

Arxiv

12+阅读 · 2019年2月26日

Generative Adversarial Autoencoder Networks

Arxiv

11+阅读 · 2018年3月23日

VIP会员

文章信息

相关主题

最新内容

《将小型无人机系统与巡飞弹集成至连及以下级别战术机动》（美陆军最新报告中文版）

《将小型无人机系统与巡飞弹集成至连及以下级别战术机动》（美陆军最新报告中文版）

专知会员服务

1+阅读 · 15分钟前

加拿大国防部发布项目需求：用于高级态势决策的多模态人工智能

加拿大国防部发布项目需求：用于高级态势决策的多模态人工智能

专知会员服务

1+阅读 · 19分钟前

《无人机革命：来自俄乌战场的启示》（报告）

《无人机革命：来自俄乌战场的启示》（报告）

专知会员服务

1+阅读 · 25分钟前

《实现联合作战能力所需的技术》58页报告

《实现联合作战能力所需的技术》58页报告

专知会员服务

1+阅读 · 43分钟前

《算法化目标定位：人工智能在以色列加沙打击行动中的作用及其伦理影响》（中文版）

《算法化目标定位：人工智能在以色列加沙打击行动中的作用及其伦理影响》（中文版）

专知会员服务

1+阅读 · 51分钟前

以色列运用人工智能优化空袭警报系统

以色列运用人工智能优化空袭警报系统

专知会员服务

0+阅读 · 53分钟前

以色列在多条战线部署AI智能体

以色列在多条战线部署AI智能体

专知会员服务

1+阅读 · 今天6:12

《将形式化方法工具应用于电子战代码库（经验报告）》

《将形式化方法工具应用于电子战代码库（经验报告）》

专知会员服务

1+阅读 · 今天6:09

2025年大语言模型进展报告

2025年大语言模型进展报告

专知会员服务

12+阅读 · 4月25日

多智能体协作机制

多智能体协作机制

专知会员服务

11+阅读 · 4月25日

非对称优势：美海军开发低成本反无人机技术

非对称优势：美海军开发低成本反无人机技术

专知会员服务

9+阅读 · 4月25日

《反无人机技术领域的技术发展综述：C-UAS探测、跟踪与识别技术》80页报告

《反无人机技术领域的技术发展综述：C-UAS探测、跟踪与识别技术》80页报告

专知会员服务

19+阅读 · 4月25日

《美战争部小企业创新研究（SBIR）计划》

《美战争部小企业创新研究（SBIR）计划》

专知会员服务

8+阅读 · 4月25日

《军事模拟：将军事条令与目标融入AI智能体》

《军事模拟：将军事条令与目标融入AI智能体》

专知会员服务

12+阅读 · 4月25日

【NTU博士论文】3D人体动作生成

【NTU博士论文】3D人体动作生成

专知会员服务

9+阅读 · 4月24日

相关VIP内容

Graph Transformer近期进展

Graph Transformer近期进展

专知会员服务

65+阅读 · 2023年1月5日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

加拿大国防部发布项目需求：用于高级态势决策的多模态人工智能

《实现联合作战能力所需的技术》58页报告

《将小型无人机系统与巡飞弹集成至连及以下级别战术机动》（美陆军最新报告中文版）

《无人机革命：来自俄乌战场的启示》（报告）

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

Featurizing Koopman Mode Decomposition

Arxiv

0+阅读 · 2024年3月24日

Debiased Multivariable Mendelian Randomization

Arxiv

0+阅读 · 2024年3月23日

Computationally Scalable Bayesian SPDE Modeling for Censored Spatial Responses

Arxiv

0+阅读 · 2024年3月23日

A Survey on Multimodal Large Language Models

Arxiv

25+阅读 · 2023年6月23日

Engagement Decision Support for Beyond Visual Range Air Combat

Engagement Decision Support for Beyond Visual Range Air Combat

Arxiv

66+阅读 · 2021年11月4日

Hierarchical Graph Capsule Network

Hierarchical Graph Capsule Network

Arxiv

20+阅读 · 2020年12月16日

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Arxiv

13+阅读 · 2020年7月3日

Geometric Graph Convolutional Neural Networks

Geometric Graph Convolutional Neural Networks

Arxiv

10+阅读 · 2019年9月11日

Embedding Uncertain Knowledge Graphs

Arxiv

12+阅读 · 2019年2月26日

Generative Adversarial Autoencoder Networks

Arxiv

11+阅读 · 2018年3月23日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

第七届全国数学文化论坛

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

DMB信号水汽探测方法若干问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

MEMS数字地震检波器专用DSP芯片优化设计

国家自然科学基金

1+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

哈密尔顿系统及多体问题的周期解

国家自然科学基金

0+阅读 · 2014年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员