Understanding and Improving Shampoo and SOAP via Kullback-Leibler Minimization - 专知论文

会员服务 ·

0

Adam · 估计/估计量 · 可理解性 · Performer · motivation ·

Understanding and Improving Shampoo and SOAP via Kullback-Leibler Minimization

翻译：暂无翻译

Wu Lin,Scott C. Lowe,Felix Dangel,Runa Eschenhagen,Zikun Xu,Roger B. Grosse

from arxiv, an extended version of the ICLR 2026 paper (added a paragraph in Sec 3.2 about short-sided KL-Shampoo as scaled Muon when momentum is disabled)

Shampoo and its efficient variant, SOAP, employ structured second-moment estimations and have shown strong performance for training neural networks (NNs). In practice, however, Shampoo typically requires step-size grafting with Adam to be competitive, and SOAP mitigates this by applying Adam in Shampoo's eigenbasis -- at the cost of additional memory overhead from Adam in both methods. Prior analyses have largely relied on the Frobenius norm to motivate these estimation schemes. We instead recast their estimation procedures as covariance estimation under Kullback-Leibler (KL) divergence minimization, revealing a previously overlooked theoretical limitation and motivating principled redesigns. Building on this perspective, we develop $\textbf{KL-Shampoo}$ and $\textbf{KL-SOAP}$, practical schemes that match or exceed the performance of Shampoo and SOAP in NN pre-training while achieving SOAP-level per-iteration runtime. Notably, KL-Shampoo does not rely on Adam to attain competitive performance, eliminating the memory overhead introduced by Adam. Across our experiments, KL-Shampoo consistently outperforms SOAP, Shampoo, and even KL-SOAP, establishing the KL-based approach as a promising foundation for designing structured methods in NN optimization. An implementation of KL-Shampoo/KL-SOAP is available at https://github.com/yorkerlin/KL-Methods

翻译：暂无翻译

0

相关内容

Adam

《高超声速吸气推进地面试验综述与NASA强化喷射混合项目》NASA最新36页slides

《高超声速吸气推进地面试验综述与NASA强化喷射混合项目》NASA最新36页slides

专知会员服务

10+阅读 · 2025年7月11日

【NeurIPS 2022】Stable Diffusion采样速度翻倍！清华提出扩散模型高效求解器

【NeurIPS 2022】Stable Diffusion采样速度翻倍！清华提出扩散模型高效求解器

专知会员服务

49+阅读 · 2022年11月17日

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

专知会员服务

11+阅读 · 2022年3月12日

EMNLP 2021 | RocketQAv2：稠密段落检索和段落精排的联合训练方法

EMNLP 2021 | RocketQAv2：稠密段落检索和段落精排的联合训练方法

专知会员服务

12+阅读 · 2021年10月24日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【泡泡图灵智库】基于上采样预积分测量值的3D Lidar-IMU校准来矫正运动失真

【泡泡图灵智库】基于上采样预积分测量值的3D Lidar-IMU校准来矫正运动失真

泡泡机器人SLAM

11+阅读 · 2019年9月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

10+阅读 · 2019年2月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hands-on Machine Learning with Scikit-Learn and TensorFlow 学习笔记

Hands-on Machine Learning with Scikit-Learn and TensorFlow 学习笔记

AINLP

12+阅读 · 2018年11月12日

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

AI研习社

13+阅读 · 2018年8月24日

手把手教 | 深度学习库PyTorch（附代码）

手把手教 | 深度学习库PyTorch（附代码）

数据派THU

27+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

压电复合结构振动俘能与主动控制同步实现的理论与试验研究

国家自然科学基金

0+阅读 · 2015年12月31日

液压自由活塞发动机电控压缩冲程机理与能量回收的基础研究

国家自然科学基金

0+阅读 · 2015年12月31日

三维Ti基PbO2+nano-MxOy复合材料的设计制备、性能及储能机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

高阶变形非圆齿轮络交机构设计方法及试验研究

国家自然科学基金

0+阅读 · 2015年12月31日

3D多孔结构LiMnPO4•LiVPO4F@石墨烯气凝胶复合物材料的构筑及电化学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

新型刚—柔复合高可靠精密传动应用基础研究

国家自然科学基金

0+阅读 · 2015年12月31日

多体动力学的多尺度保辛摄动理论与算法

国家自然科学基金

0+阅读 · 2014年12月31日

阻尼式泡沫橡胶路面减速机理与行为表达

国家自然科学基金

0+阅读 · 2014年12月31日

高速铁路轨道－路基服役性能演变与状态控制技术研究

国家自然科学基金

1+阅读 · 2014年12月31日

环氧树脂基交联网络微观结构调控及其热致形状记忆构效关系

国家自然科学基金

0+阅读 · 2014年12月31日

DynamicPO: Dynamic Preference Optimization for Recommendation

Arxiv

0+阅读 · 6月23日

Collapsed Effective Operators for Higher-order Structures

Collapsed Effective Operators for Higher-order Structures

Arxiv

0+阅读 · 6月22日

Adaptive GoGI-Skip: Coupling Goal-Gradient Importance with Dynamic Uncertainty for Efficient Reasoning

Arxiv

0+阅读 · 6月22日

Beyond Penalizing Mistakes: Stabilizing Efficiency Training in Large Reasoning Models via Adaptive Correct-Only Rewards

Arxiv

0+阅读 · 6月21日

Stabilizing Consistency Training: A Flow Map Analysis and Self-Distillation

Arxiv

0+阅读 · 6月20日

CoLI: A Reproducible Platform for Continuum Robot Learning via Monolithic 3D Printing and Isomorphic Teleoperation

Arxiv

0+阅读 · 6月18日

Robust and Efficient MuJoCo-based Model Predictive Control via Web of Affine Spaces Derivatives

Arxiv

0+阅读 · 6月17日

Directed Reachability-Preserving Minimum Edge Cut: Approximation and Planar Hardness

Arxiv

0+阅读 · 6月16日

Coarse Preference Reporting in the Bottleneck Model: Approximate Strategyproofness and Efficiency

Arxiv

0+阅读 · 6月16日

Beyond Importance: Interchange-Sobol Sensitivity Reveals Task-Specific Content Channels in Transformer Components

Arxiv

0+阅读 · 6月12日

VIP会员

文章信息

相关主题

估计/估计量

最新内容

五角大楼启动“智能体网络”以推进人工智能赋能的战斗管理与目标打击

五角大楼启动“智能体网络”以推进人工智能赋能的战斗管理与目标打击

专知会员服务

8+阅读 · 6月27日

2025年全球二十起重大无人机作战事件

2025年全球二十起重大无人机作战事件

专知会员服务

2+阅读 · 6月27日

现代战争的隐蔽系统：伊朗战争十大启示

现代战争的隐蔽系统：伊朗战争十大启示

专知会员服务

3+阅读 · 6月27日

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

专知会员服务

5+阅读 · 6月26日

GNN跨域综述：从消息传递到图基础模型

GNN跨域综述：从消息传递到图基础模型

专知会员服务

8+阅读 · 6月26日

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

15+阅读 · 6月26日

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

5+阅读 · 6月26日

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

4+阅读 · 6月26日

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

3+阅读 · 6月26日

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

8+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

7+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

10+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

9+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

9+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

相关VIP内容

《高超声速吸气推进地面试验综述与NASA强化喷射混合项目》NASA最新36页slides

《高超声速吸气推进地面试验综述与NASA强化喷射混合项目》NASA最新36页slides

专知会员服务

10+阅读 · 2025年7月11日

【NeurIPS 2022】Stable Diffusion采样速度翻倍！清华提出扩散模型高效求解器

【NeurIPS 2022】Stable Diffusion采样速度翻倍！清华提出扩散模型高效求解器

专知会员服务

49+阅读 · 2022年11月17日

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

【CVPR 2022】连续驾驶场景与不断增长的建筑的连续立体匹配，Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture

专知会员服务

11+阅读 · 2022年3月12日

EMNLP 2021 | RocketQAv2：稠密段落检索和段落精排的联合训练方法

EMNLP 2021 | RocketQAv2：稠密段落检索和段落精排的联合训练方法

专知会员服务

12+阅读 · 2021年10月24日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

2025年全球二十起重大无人机作战事件

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

五角大楼启动“智能体网络”以推进人工智能赋能的战斗管理与目标打击

现代战争的隐蔽系统：伊朗战争十大启示

相关资讯

【泡泡图灵智库】基于上采样预积分测量值的3D Lidar-IMU校准来矫正运动失真

【泡泡图灵智库】基于上采样预积分测量值的3D Lidar-IMU校准来矫正运动失真

泡泡机器人SLAM

11+阅读 · 2019年9月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

10+阅读 · 2019年2月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hands-on Machine Learning with Scikit-Learn and TensorFlow 学习笔记

Hands-on Machine Learning with Scikit-Learn and TensorFlow 学习笔记

AINLP

12+阅读 · 2018年11月12日

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

用 LDA 和 LSA 两种方法来降维和做 Topic 建模

AI研习社

13+阅读 · 2018年8月24日

手把手教 | 深度学习库PyTorch（附代码）

手把手教 | 深度学习库PyTorch（附代码）

数据派THU

27+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

DynamicPO: Dynamic Preference Optimization for Recommendation

Arxiv

0+阅读 · 6月23日

Collapsed Effective Operators for Higher-order Structures

Collapsed Effective Operators for Higher-order Structures

Arxiv

0+阅读 · 6月22日

Adaptive GoGI-Skip: Coupling Goal-Gradient Importance with Dynamic Uncertainty for Efficient Reasoning

Arxiv

0+阅读 · 6月22日

Beyond Penalizing Mistakes: Stabilizing Efficiency Training in Large Reasoning Models via Adaptive Correct-Only Rewards

Arxiv

0+阅读 · 6月21日

Stabilizing Consistency Training: A Flow Map Analysis and Self-Distillation

Arxiv

0+阅读 · 6月20日

CoLI: A Reproducible Platform for Continuum Robot Learning via Monolithic 3D Printing and Isomorphic Teleoperation

Arxiv

0+阅读 · 6月18日

Robust and Efficient MuJoCo-based Model Predictive Control via Web of Affine Spaces Derivatives

Arxiv

0+阅读 · 6月17日

Directed Reachability-Preserving Minimum Edge Cut: Approximation and Planar Hardness

Arxiv

0+阅读 · 6月16日

Coarse Preference Reporting in the Bottleneck Model: Approximate Strategyproofness and Efficiency

Arxiv

0+阅读 · 6月16日

Beyond Importance: Interchange-Sobol Sensitivity Reveals Task-Specific Content Channels in Transformer Components

Arxiv

0+阅读 · 6月12日

相关基金

压电复合结构振动俘能与主动控制同步实现的理论与试验研究

国家自然科学基金

0+阅读 · 2015年12月31日

液压自由活塞发动机电控压缩冲程机理与能量回收的基础研究

国家自然科学基金

0+阅读 · 2015年12月31日

三维Ti基PbO2+nano-MxOy复合材料的设计制备、性能及储能机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

高阶变形非圆齿轮络交机构设计方法及试验研究

国家自然科学基金

0+阅读 · 2015年12月31日

3D多孔结构LiMnPO4•LiVPO4F@石墨烯气凝胶复合物材料的构筑及电化学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

新型刚—柔复合高可靠精密传动应用基础研究

国家自然科学基金

0+阅读 · 2015年12月31日

多体动力学的多尺度保辛摄动理论与算法

国家自然科学基金

0+阅读 · 2014年12月31日

阻尼式泡沫橡胶路面减速机理与行为表达

国家自然科学基金

0+阅读 · 2014年12月31日

高速铁路轨道－路基服役性能演变与状态控制技术研究

国家自然科学基金

1+阅读 · 2014年12月31日

环氧树脂基交联网络微观结构调控及其热致形状记忆构效关系

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员