UniPool: A Globally Shared Expert Pool for Mixture-of-Experts - 专知论文

会员服务 ·

0

汇聚 · 损失 · 层 · 缩放 · MoDELS ·

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

翻译：暂无翻译

Minbin Huang,Han Shi,Chuanyang Zheng,Yimeng Wu,Guoxuan Chen,Xintong Yu,Yichun Yin,Hong Cheng

Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert capacity. However, recent analyses and our routing probe challenge this allocation rule: replacing a deeper layer's learned top-k router with uniform random routing drops downstream accuracy by only 1.0-1.6 points across multiple production MoE models. Motivated by this redundancy, we propose UniPool, an MoE architecture that treats expert capacity as a global architectural budget by replacing per-layer expert ownership with a single shared pool accessed by independent per-layer routers. To enable stable and balanced training under sharing, we introduce a pool-level auxiliary loss that balances expert utilization across the entire pool, and adopt NormRouter to provide sparse and scale-stable routing into the shared expert pool. Across five LLaMA-architecture model scales (182M, 469M, 650M, 830M, and 978M parameters) trained on 30B tokens from the Pile, UniPool consistently improves validation loss and perplexity over the matched vanilla MoE baselines. Across these scales, UniPool reduces validation loss by up to 0.0386 relative to vanilla MoE. Beyond raw loss improvement, our results identify pool size as an explicit depth-scaling hyperparameter: reduced-pool UniPool variants using only 41.6%-66.7% of the vanilla expert-parameter budget match or outperform layer-wise MoE at the tested scales. This shows that, under a shared-pool design, expert parameters need not grow linearly with depth; they can grow sublinearly while remaining more efficient and effective than vanilla MoE. Further analysis shows that UniPool's benefits compose with finer-grained expert decomposition.

翻译：暂无翻译

0

相关内容

《高超声速吸气推进地面试验综述与NASA强化喷射混合项目》NASA最新36页slides

《高超声速吸气推进地面试验综述与NASA强化喷射混合项目》NASA最新36页slides

专知会员服务

10+阅读 · 2025年7月11日

混合专家模型简述

混合专家模型简述

专知会员服务

18+阅读 · 2025年5月30日

不可错过！EPFL《训练大语言模型》课程

不可错过！EPFL《训练大语言模型》课程

专知会员服务

18+阅读 · 2025年4月25日

算法、系统和应用，三个视角全面读懂《混合专家（MoE）》

算法、系统和应用，三个视角全面读懂《混合专家（MoE）》

专知会员服务

77+阅读 · 2024年7月28日

GPT-4o核心技术？哈工大最新《Uni-MoE：使用专家混合模型扩展统一多模态大语言模型》

GPT-4o核心技术？哈工大最新《Uni-MoE：使用专家混合模型扩展统一多模态大语言模型》

专知会员服务

35+阅读 · 2024年5月26日

ICLR2024｜Mol-Instructions: 面向大模型的大规模生物分子指令数据集

ICLR2024｜Mol-Instructions: 面向大模型的大规模生物分子指令数据集

专知会员服务

12+阅读 · 2024年2月10日

AAAI 2024 | MolTailor：通过文本提示定制化学分子表征以适应特定任务的方法

AAAI 2024 | MolTailor：通过文本提示定制化学分子表征以适应特定任务的方法

专知会员服务

17+阅读 · 2024年1月26日

NeuraIPS2023：“先编码、后分离” ——学习泛化能力更强的分子图表示

NeuraIPS2023：“先编码、后分离” ——学习泛化能力更强的分子图表示

专知会员服务

25+阅读 · 2023年11月1日

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

专知会员服务

11+阅读 · 2022年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Diffusion的火，只是AIGC的缩影 | 量子位智库报告（附下载）

Diffusion的火，只是AIGC的缩影 | 量子位智库报告（附下载）

量子位

10+阅读 · 2022年9月22日

【泡泡点云时空】Potree：基于Web浏览器的大规模点云渲染

【泡泡点云时空】Potree：基于Web浏览器的大规模点云渲染

泡泡机器人SLAM

58+阅读 · 2019年6月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 基于开放世界的知识图谱补全

论文浅尝 | 基于开放世界的知识图谱补全

开放知识图谱

11+阅读 · 2018年7月3日

【论文推荐】最新八篇主题模型相关论文—在线光谱学习、PAM变分推断、章节推荐、多芯片系统、文本分析、动态主题模型

【论文推荐】最新八篇主题模型相关论文—在线光谱学习、PAM变分推断、章节推荐、多芯片系统、文本分析、动态主题模型

专知

12+阅读 · 2018年5月6日

读书报告 | Deep Learning for Extreme Multi-label Text Classification

读书报告 | Deep Learning for Extreme Multi-label Text Classification

科技创新与创业

48+阅读 · 2018年1月10日

论文浅尝 | Question Answering over Freebase

论文浅尝 | Question Answering over Freebase

开放知识图谱

19+阅读 · 2018年1月9日

大神Geoffrey Hinton那篇备受关注的Capsule论文终于公开了

大神Geoffrey Hinton那篇备受关注的Capsule论文终于公开了

数据玩家

13+阅读 · 2017年10月28日

不锈钢无纺布基Li4Ti5O12@石墨烯/石墨烯@碳泡沫锂离子混合超级电容器

国家自然科学基金

0+阅读 · 2015年12月31日

高温（火灾）作用后微纳米尺度现代水泥基材料力学性能的研究

国家自然科学基金

0+阅读 · 2015年12月31日

氧化石墨烯纳米增容聚乳酸/弹性体不相容共混物的界面构筑及增容机制

国家自然科学基金

0+阅读 · 2015年12月31日

大变形高固溶Mg含量Al-Mg合金的纳、微米混晶组织形成及强塑性同时提高机制

国家自然科学基金

0+阅读 · 2015年12月31日

球形储能腔能量倍增器的研制

国家自然科学基金

0+阅读 · 2015年12月31日

MOFs纳米粒子的制备及其对不相容共混物相结构的调控与稳定作用

国家自然科学基金

0+阅读 · 2015年12月31日

金属有机框架（MOFs）材料的构筑、纳米化及光学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

氨功能化与碱金属掺杂对MOF膜CO2吸附分离性能的协同机制

国家自然科学基金

0+阅读 · 2015年12月31日

钢-聚丙烯混杂纤维混凝土多尺度本构关系: 从纳米尺度到宏观尺度

国家自然科学基金

0+阅读 · 2014年12月31日

互穿网络型离子液体修饰的高孔容金属-有机框架材料的构筑及捕集CO2机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

Arxiv

0+阅读 · 6月17日

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

Arxiv

0+阅读 · 6月16日

Mixed-Categorical Black-Box Optimization via Information-Geometric Bilevel Decomposition

Arxiv

0+阅读 · 6月11日

Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design

Arxiv

0+阅读 · 6月9日

UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

Arxiv

0+阅读 · 6月2日

Mixed Unit Interval Bigraphs : A Characterization

Arxiv

0+阅读 · 5月26日

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

Arxiv

0+阅读 · 5月20日

NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding

Arxiv

0+阅读 · 5月20日

EMO: Pretraining Mixture of Experts for Emergent Modularity

Arxiv

0+阅读 · 5月7日

Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend

Arxiv

0+阅读 · 5月7日

VIP会员

文章信息

相关主题

最新内容

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

0+阅读 · 3分钟前

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

1+阅读 · 15分钟前

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

1+阅读 · 26分钟前

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

1+阅读 · 35分钟前

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

1+阅读 · 39分钟前

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

1+阅读 · 43分钟前

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

1+阅读 · 47分钟前

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

5+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

8+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

6+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

5+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

8+阅读 · 6月22日

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

7+阅读 · 6月21日

相关VIP内容

《高超声速吸气推进地面试验综述与NASA强化喷射混合项目》NASA最新36页slides

《高超声速吸气推进地面试验综述与NASA强化喷射混合项目》NASA最新36页slides

专知会员服务

10+阅读 · 2025年7月11日

混合专家模型简述

混合专家模型简述

专知会员服务

18+阅读 · 2025年5月30日

不可错过！EPFL《训练大语言模型》课程

不可错过！EPFL《训练大语言模型》课程

专知会员服务

18+阅读 · 2025年4月25日

算法、系统和应用，三个视角全面读懂《混合专家（MoE）》

算法、系统和应用，三个视角全面读懂《混合专家（MoE）》

专知会员服务

77+阅读 · 2024年7月28日

GPT-4o核心技术？哈工大最新《Uni-MoE：使用专家混合模型扩展统一多模态大语言模型》

GPT-4o核心技术？哈工大最新《Uni-MoE：使用专家混合模型扩展统一多模态大语言模型》

专知会员服务

35+阅读 · 2024年5月26日

ICLR2024｜Mol-Instructions: 面向大模型的大规模生物分子指令数据集

ICLR2024｜Mol-Instructions: 面向大模型的大规模生物分子指令数据集

专知会员服务

12+阅读 · 2024年2月10日

AAAI 2024 | MolTailor：通过文本提示定制化学分子表征以适应特定任务的方法

AAAI 2024 | MolTailor：通过文本提示定制化学分子表征以适应特定任务的方法

专知会员服务

17+阅读 · 2024年1月26日

NeuraIPS2023：“先编码、后分离” ——学习泛化能力更强的分子图表示

NeuraIPS2023：“先编码、后分离” ——学习泛化能力更强的分子图表示

专知会员服务

25+阅读 · 2023年11月1日

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

专知会员服务

11+阅读 · 2022年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

美以伊冲突：无人机与人工智能的运用

《特种部队在透明战场中的生存力》最新报告

相关资讯

Diffusion的火，只是AIGC的缩影 | 量子位智库报告（附下载）

Diffusion的火，只是AIGC的缩影 | 量子位智库报告（附下载）

量子位

10+阅读 · 2022年9月22日

【泡泡点云时空】Potree：基于Web浏览器的大规模点云渲染

【泡泡点云时空】Potree：基于Web浏览器的大规模点云渲染

泡泡机器人SLAM

58+阅读 · 2019年6月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 基于开放世界的知识图谱补全

论文浅尝 | 基于开放世界的知识图谱补全

开放知识图谱

11+阅读 · 2018年7月3日

【论文推荐】最新八篇主题模型相关论文—在线光谱学习、PAM变分推断、章节推荐、多芯片系统、文本分析、动态主题模型

【论文推荐】最新八篇主题模型相关论文—在线光谱学习、PAM变分推断、章节推荐、多芯片系统、文本分析、动态主题模型

专知

12+阅读 · 2018年5月6日

读书报告 | Deep Learning for Extreme Multi-label Text Classification

读书报告 | Deep Learning for Extreme Multi-label Text Classification

科技创新与创业

48+阅读 · 2018年1月10日

论文浅尝 | Question Answering over Freebase

论文浅尝 | Question Answering over Freebase

开放知识图谱

19+阅读 · 2018年1月9日

大神Geoffrey Hinton那篇备受关注的Capsule论文终于公开了

大神Geoffrey Hinton那篇备受关注的Capsule论文终于公开了

数据玩家

13+阅读 · 2017年10月28日

相关论文

Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

Arxiv

0+阅读 · 6月17日

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

Arxiv

0+阅读 · 6月16日

Mixed-Categorical Black-Box Optimization via Information-Geometric Bilevel Decomposition

Arxiv

0+阅读 · 6月11日

Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design

Arxiv

0+阅读 · 6月9日

UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

Arxiv

0+阅读 · 6月2日

Mixed Unit Interval Bigraphs : A Characterization

Arxiv

0+阅读 · 5月26日

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

Arxiv

0+阅读 · 5月20日

NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding

Arxiv

0+阅读 · 5月20日

EMO: Pretraining Mixture of Experts for Emergent Modularity

Arxiv

0+阅读 · 5月7日

Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend

Arxiv

0+阅读 · 5月7日

相关基金

不锈钢无纺布基Li4Ti5O12@石墨烯/石墨烯@碳泡沫锂离子混合超级电容器

国家自然科学基金

0+阅读 · 2015年12月31日

高温（火灾）作用后微纳米尺度现代水泥基材料力学性能的研究

国家自然科学基金

0+阅读 · 2015年12月31日

氧化石墨烯纳米增容聚乳酸/弹性体不相容共混物的界面构筑及增容机制

国家自然科学基金

0+阅读 · 2015年12月31日

大变形高固溶Mg含量Al-Mg合金的纳、微米混晶组织形成及强塑性同时提高机制

国家自然科学基金

0+阅读 · 2015年12月31日

球形储能腔能量倍增器的研制

国家自然科学基金

0+阅读 · 2015年12月31日

MOFs纳米粒子的制备及其对不相容共混物相结构的调控与稳定作用

国家自然科学基金

0+阅读 · 2015年12月31日

金属有机框架（MOFs）材料的构筑、纳米化及光学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

氨功能化与碱金属掺杂对MOF膜CO2吸附分离性能的协同机制

国家自然科学基金

0+阅读 · 2015年12月31日

钢-聚丙烯混杂纤维混凝土多尺度本构关系: 从纳米尺度到宏观尺度

国家自然科学基金

0+阅读 · 2014年12月31日

互穿网络型离子液体修饰的高孔容金属-有机框架材料的构筑及捕集CO2机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员