Hulk: Graph Neural Networks for Optimizing Regionally Distributed Computing Systems - 专知论文

会员服务 ·

0

并行 · 深度学习模型 · 系统 · 学习模型 · 图神经网络 ·

2023 年 4 月 13 日

Hulk: Graph Neural Networks for Optimizing Regionally Distributed Computing Systems

翻译：标题：Hulk：面向区域分布式计算系统优化的图神经网络

Zhengqing Yuan,Huiwen Xue,Chao Zhang,Yongming Liu

from arxiv, 16 pages,10 figures, Accepted by Intelligent Systems Conference(IntelliSys 2023)

Large deep learning models have shown great potential for delivering exceptional results in various applications. However, the training process can be incredibly challenging due to the models' vast parameter sizes, often consisting of hundreds of billions of parameters. Common distributed training methods, such as data parallelism, tensor parallelism, and pipeline parallelism, demand significant data communication throughout the process, leading to prolonged wait times for some machines in physically distant distributed systems. To address this issue, we propose a novel solution called Hulk, which utilizes a modified graph neural network to optimize distributed computing systems. Hulk not only optimizes data communication efficiency between different countries or even different regions within the same city, but also provides optimal distributed deployment of models in parallel. For example, it can place certain layers on a machine in a specific region or pass specific parameters of a model to a machine in a particular location. By using Hulk in experiments, we were able to improve the time efficiency of training large deep learning models on distributed systems by more than 20\%. Our open source collection of unlabeled data:https://github.com/DLYuanGod/Hulk.

翻译：摘要：大型深度学习模型在各类应用中展现出卓越潜力，但因其包含数千亿参数的庞大参数量，训练过程极具挑战性。常规分布式训练方法（如数据并行、张量并行和流水线并行）在训练全程需要大量数据通信，导致物理距离较远的分布式系统中部分机器出现长时间等待。针对该问题，我们提出一种名为Hulk的新型解决方案，该方案利用改进的图神经网络优化分布式计算系统。Hulk不仅能优化跨国家甚至同城市不同区域间的数据通信效率，还能实现模型并行部署的最优分布。例如，它可将特定层部署于某区域机器上，或将模型特定参数传递至指定位置的设备。实验表明，采用Hulk可使分布式系统上大型深度学习模型的时间效率提升超过20%。我们的开源无标签数据集已公开（https://github.com/DLYuanGod/Hulk）。

0

相关内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Max Welling】图神经网络知识表示与推荐，Graph Neural Networks for Knowledge Representation and Recommendation

【Max Welling】图神经网络知识表示与推荐，Graph Neural Networks for Knowledge Representation and Recommendation

专知会员服务

44+阅读 · 2022年3月4日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

108+阅读 · 2021年10月30日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

图神经网络库PyTorch geometric

图神经网络库PyTorch geometric

图与推荐

17+阅读 · 2020年3月22日

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

专知

78+阅读 · 2019年5月31日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

25+阅读 · 2017年8月14日

利用复杂网络理論优化车载通信网络

国家自然科学基金

1+阅读 · 2014年12月31日

高密度三维封装TSV电迁移可靠性机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

大规模RFID系统标签的自适应高效准确识别策略研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于学习的复杂并行绘制系统负载平衡算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

重载交通条件下车路耦合系统非线性动力学行为精细化仿真

国家自然科学基金

0+阅读 · 2013年12月31日

基于网格的分布式雷达仿真系统关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于云计算的3D地震勘探专用GPS定位方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

机场飞行区安全风险演化机理及预警仿真系统研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于协作干扰的大规模无线网络自主物理层安全传输机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

光系统I稳态光谱和激子动力学过程的理论模拟

国家自然科学基金

0+阅读 · 2011年12月31日

Exact Distributed Stochastic Block Partitioning

Arxiv

0+阅读 · 2023年5月30日

Computation Offloading for Edge Computing in RIS-Assisted Symbiotic Radio Systems

Arxiv

0+阅读 · 2023年5月29日

Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks

Arxiv

0+阅读 · 2023年5月28日

Green Runner: A tool for efficient model selection from model repositories

Arxiv

0+阅读 · 2023年5月26日

Clustering Method for Time-Series Images Using Quantum-Inspired Computing Technology

Arxiv

0+阅读 · 2023年5月26日

Distributed Graph Neural Network Training: A Survey

Arxiv

16+阅读 · 2022年11月1日

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Arxiv

36+阅读 · 2022年4月25日

Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks

Arxiv

36+阅读 · 2020年5月24日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

67+阅读 · 2019年9月8日

VIP会员

文章信息

相关主题

深度学习模型

图神经网络

最新内容

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

4+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

6+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

6+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

4+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

7+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

6+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

14+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

7+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

9+阅读 · 7月19日

《无人机蜂群通信技术研究》50页

《无人机蜂群通信技术研究》50页

专知会员服务

10+阅读 · 7月19日

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

专知会员服务

15+阅读 · 7月18日

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

专知会员服务

8+阅读 · 7月18日

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

专知会员服务

16+阅读 · 7月18日

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

专知会员服务

11+阅读 · 7月18日

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Max Welling】图神经网络知识表示与推荐，Graph Neural Networks for Knowledge Representation and Recommendation

【Max Welling】图神经网络知识表示与推荐，Graph Neural Networks for Knowledge Representation and Recommendation

专知会员服务

44+阅读 · 2022年3月4日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

108+阅读 · 2021年10月30日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

印度精确打击与指挥架构的断层

美空军AI完成F-16战斗机自主空战历史性试飞

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

图神经网络库PyTorch geometric

图神经网络库PyTorch geometric

图与推荐

17+阅读 · 2020年3月22日

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

一行TensorFlow/Keras代码解决真实场景中数据不平衡(imbalanced)问题

专知

78+阅读 · 2019年5月31日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

25+阅读 · 2017年8月14日

相关论文

Exact Distributed Stochastic Block Partitioning

Arxiv

0+阅读 · 2023年5月30日

Computation Offloading for Edge Computing in RIS-Assisted Symbiotic Radio Systems

Arxiv

0+阅读 · 2023年5月29日

Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks

Arxiv

0+阅读 · 2023年5月28日

Green Runner: A tool for efficient model selection from model repositories

Arxiv

0+阅读 · 2023年5月26日

Clustering Method for Time-Series Images Using Quantum-Inspired Computing Technology

Arxiv

0+阅读 · 2023年5月26日

Distributed Graph Neural Network Training: A Survey

Arxiv

16+阅读 · 2022年11月1日

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Arxiv

36+阅读 · 2022年4月25日

Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks

Arxiv

36+阅读 · 2020年5月24日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

67+阅读 · 2019年9月8日

相关基金

利用复杂网络理論优化车载通信网络

国家自然科学基金

1+阅读 · 2014年12月31日

高密度三维封装TSV电迁移可靠性机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

大规模RFID系统标签的自适应高效准确识别策略研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于学习的复杂并行绘制系统负载平衡算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

重载交通条件下车路耦合系统非线性动力学行为精细化仿真

国家自然科学基金

0+阅读 · 2013年12月31日

基于网格的分布式雷达仿真系统关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于云计算的3D地震勘探专用GPS定位方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

机场飞行区安全风险演化机理及预警仿真系统研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于协作干扰的大规模无线网络自主物理层安全传输机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

光系统I稳态光谱和激子动力学过程的理论模拟

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员