Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks - 专知论文

会员服务 ·

0

神经网络训练 · 近似 · 神经网络 · 深度神经网络 · 相互作用 ·

2023 年 4 月 3 日

Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks

翻译：分析并比较两级KFAC方法在深度神经网络训练中的应用

Abdoulaye Koroko,Ani Anciaux-Sedrakian,Ibtihel Ben Gharbia,Valérie Garès,Mounir Haddou,Quang Huy Tran

from arxiv, Under Review

As a second-order method, the Natural Gradient Descent (NGD) has the ability to accelerate training of neural networks. However, due to the prohibitive computational and memory costs of computing and inverting the Fisher Information Matrix (FIM), efficient approximations are necessary to make NGD scalable to Deep Neural Networks (DNNs). Many such approximations have been attempted. The most sophisticated of these is KFAC, which approximates the FIM as a block-diagonal matrix, where each block corresponds to a layer of the neural network. By doing so, KFAC ignores the interactions between different layers. In this work, we investigate the interest of restoring some low-frequency interactions between the layers by means of two-level methods. Inspired from domain decomposition, several two-level corrections to KFAC using different coarse spaces are proposed and assessed. The obtained results show that incorporating the layer interactions in this fashion does not really improve the performance of KFAC. This suggests that it is safe to discard the off-diagonal blocks of the FIM, since the block-diagonal approach is sufficiently robust, accurate and economical in computation time.

翻译：作为二阶方法，自然梯度下降（NGD）具有加速神经网络训练的能力。然而，由于计算和存储Fisher信息矩阵（FIM）及其逆矩阵的高昂代价，需要高效的近似方法才能使NGD可扩展至深度神经网络（DNN）。已有多种此类近似尝试，其中最为精密的是KFAC，它将FIM近似为块对角矩阵，每个对角块对应神经网络的一个层。通过这种方式，KFAC忽略了不同层之间的相互作用。在本工作中，我们通过两级方法研究恢复层间低频相互作用的可行性。受领域分解启发，我们提出并评估了多种使用不同粗网格修正的两级KFAC方法。结果表明，以这种方式纳入层间相互作用并不能真正提升KFAC的性能。这提示我们，忽略FIM的非对角块是安全的，因为块对角方法在计算时间上足够稳健、精确且高效。

0

相关内容

神经网络训练

神经网络训练

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

专知会员服务

19+阅读 · 2022年3月13日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

【干货书】面向计算科学和工程的Python导论，167页pdf

【干货书】面向计算科学和工程的Python导论，167页pdf

专知会员服务

42+阅读 · 2021年4月7日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【开放新书】可验证深度学习，91页pdf阐述Deep Learning的鲁棒性，提升安全可靠性

【开放新书】可验证深度学习，91页pdf阐述Deep Learning的鲁棒性，提升安全可靠性

专知会员服务

61+阅读 · 2020年4月11日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【UCLA论文】利用神经网络进行时间序列预测的综合分析，Comprehensive Analysis of Time Series Forecasting Using Neural Networks

【UCLA论文】利用神经网络进行时间序列预测的综合分析，Comprehensive Analysis of Time Series Forecasting Using Neural Networks

专知会员服务

97+阅读 · 2020年2月3日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

适定的多元样条逼近方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

骨髓间充质干细胞与神经干细胞序贯移植修复脊髓损伤的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

前馈神经网络的结构稀疏化设计与分析

国家自然科学基金

0+阅读 · 2014年12月31日

工程地震动随机场非平稳各向异性特征分析与物理建模

国家自然科学基金

0+阅读 · 2013年12月31日

Thbs4介导的ADSCs调控内源性干细胞转化在脊髓损伤修复中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

三维模型在异构空间中的语义迁移方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于随机有限元的岩体工程可靠度方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

猪弓形虫体内诱导抗原的鉴定及功能分析

国家自然科学基金

0+阅读 · 2008年12月31日

树、格及Hurwitz排列中的计数问题

国家自然科学基金

0+阅读 · 2008年12月31日

Bayesian Numerical Integration with Neural Networks

Arxiv

0+阅读 · 2023年5月22日

On improving the efficiency of ADER methods

Arxiv

0+阅读 · 2023年5月22日

Complexity of Gaussian boson sampling with tensor networks

Arxiv

0+阅读 · 2023年5月22日

The Obnoxious Facility Location Game with Dichotomous Preferences

Arxiv

0+阅读 · 2023年5月21日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

67+阅读 · 2019年9月8日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Graph Neural Networks: A Review of Methods and Applications

Graph Neural Networks: A Review of Methods and Applications

Arxiv

76+阅读 · 2018年12月20日

VIP会员

文章信息

相关主题

神经网络训练

深度神经网络

最新内容

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

3+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

4+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

6+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

5+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

6+阅读 · 6月22日

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

7+阅读 · 6月21日

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

5+阅读 · 6月21日

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

8+阅读 · 6月21日

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

22+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

5+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

8+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

7+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

9+阅读 · 6月18日

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

专知会员服务

19+阅读 · 2022年3月13日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

【干货书】面向计算科学和工程的Python导论，167页pdf

【干货书】面向计算科学和工程的Python导论，167页pdf

专知会员服务

42+阅读 · 2021年4月7日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【开放新书】可验证深度学习，91页pdf阐述Deep Learning的鲁棒性，提升安全可靠性

【开放新书】可验证深度学习，91页pdf阐述Deep Learning的鲁棒性，提升安全可靠性

专知会员服务

61+阅读 · 2020年4月11日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【UCLA论文】利用神经网络进行时间序列预测的综合分析，Comprehensive Analysis of Time Series Forecasting Using Neural Networks

【UCLA论文】利用神经网络进行时间序列预测的综合分析，Comprehensive Analysis of Time Series Forecasting Using Neural Networks

专知会员服务

97+阅读 · 2020年2月3日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 3D场景图：开放挑战与未来方向

21世纪的无人机战争

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

相关论文

Bayesian Numerical Integration with Neural Networks

Arxiv

0+阅读 · 2023年5月22日

On improving the efficiency of ADER methods

Arxiv

0+阅读 · 2023年5月22日

Complexity of Gaussian boson sampling with tensor networks

Arxiv

0+阅读 · 2023年5月22日

The Obnoxious Facility Location Game with Dichotomous Preferences

Arxiv

0+阅读 · 2023年5月21日

Training Graph Neural Networks with 1000 Layers

Arxiv

13+阅读 · 2021年6月14日

Adaptive Methods for Real-World Domain Generalization

Arxiv

13+阅读 · 2021年3月29日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

67+阅读 · 2019年9月8日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

Graph Neural Networks: A Review of Methods and Applications

Graph Neural Networks: A Review of Methods and Applications

Arxiv

76+阅读 · 2018年12月20日

相关基金

适定的多元样条逼近方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

骨髓间充质干细胞与神经干细胞序贯移植修复脊髓损伤的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

前馈神经网络的结构稀疏化设计与分析

国家自然科学基金

0+阅读 · 2014年12月31日

工程地震动随机场非平稳各向异性特征分析与物理建模

国家自然科学基金

0+阅读 · 2013年12月31日

Thbs4介导的ADSCs调控内源性干细胞转化在脊髓损伤修复中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

三维模型在异构空间中的语义迁移方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于随机有限元的岩体工程可靠度方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

猪弓形虫体内诱导抗原的鉴定及功能分析

国家自然科学基金

0+阅读 · 2008年12月31日

树、格及Hurwitz排列中的计数问题

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员