Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism - 专知论文

会员服务 ·

0

优化器 · INFORMS · Performer · 稀疏 · API ·

2023 年 6 月 2 日

Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism

翻译：利用邻域集合与位置感知并行优化不规则通信

Gerald Collom,Rui Peng Li,Amanda Bienz

Irregular communication often limits both the performance and scalability of parallel applications. Typically, applications individually implement irregular messages using point-to-point communications, and any optimizations are added directly into the application. As a result, these optimizations lack portability. There is no easy way to optimize point-to-point messages within MPI, as the interface for single messages provides no information on the collection of all communication to be performed. However, the persistent neighbor collective API, released in the MPI 4 standard, provides an interface for portable optimizations of irregular communication within MPI libraries. This paper presents methods for optimizing irregular communication within neighborhood collectives, analyzes the impact of replacing point-to-point communication in existing codebases such as Hypre BoomerAMG with neighborhood collectives, and finally shows an up to 1.32x speedup on sparse matrix-vector multiplication within a BoomerAMG solve through the use of our optimized neighbor collectives. The authors analyze multiple implementations of neighborhood collectives, including a standard implementation, which simply wraps standard point-to-point communication, as well as multiple implementations of locality-aware aggregation. All optimizations are available in an open-source codebase, MPI Advance, which sits on top of MPI, allowing for optimizations to be added into existing codebases regardless of the system MPI install.

翻译：不规则通信常常限制并行应用的性能与可扩展性。通常，应用通过点对点通信独立实现不规则消息，任何优化直接嵌入应用内部，导致这些优化缺乏可移植性。由于单条消息的接口未提供所有待执行通信的整体信息，因此很难在MPI框架内优化点对点消息。然而，MPI 4标准中发布的持久化邻域集合API为在MPI库内部实现不规则通信的可移植优化提供了接口。本文提出在邻域集合内优化不规则通信的方法，分析在Hypre BoomerAMG等现有代码库中用邻域集合替换点对点通信的影响，并通过实验证明，在BoomerAMG求解过程中，使用优化后的邻域集合可使稀疏矩阵-向量乘法加速比最高达1.32倍。作者分析了邻域集合的多种实现，包括简单封装标准点对点通信的标准实现，以及多种位置感知聚合实现。所有优化均在开源代码库MPI Advance中提供，该库基于MPI构建，允许在系统MPI安装环境无关的情况下将优化集成到现有代码库中。

0

相关内容

优化器

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

机器学习研究会

36+阅读 · 2017年12月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

单原子催化剂M1/TiO2(M=Cu,Ag,Au,Pt,Pd,Ir)催化甲胺裂解的理论研究

国家自然科学基金

0+阅读 · 2015年12月31日

广义透孢黑团壳属Massarina分类与分子系统学研究

国家自然科学基金

0+阅读 · 2015年12月31日

国产盆距兰属(Gastrochilus)的分类修订

国家自然科学基金

0+阅读 · 2015年12月31日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

TMOD1调节actin聚合影响胰岛素信号转导的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

叶绿体类核区蛋白PUC1对PEP类型基因表达的分子调节机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

AMPKα1调节介导Ca2+内流对高糖诱导内皮细胞调亡的影响及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

根际促生菌Bacillus amyloliquefaciens SQR9与植物根系分泌物互作的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

GSK-3β调控血管平滑肌细胞特异性转录因子Myocardin对动脉粥样硬化斑块形成作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

活化的蛋白激酶C1受体在脂联素受体1介导的信号转导及能量代谢中作用

国家自然科学基金

0+阅读 · 2009年12月31日

FATROP : A Fast Constrained Optimal Control Problem Solver for Robot Trajectory Optimization and Control

Arxiv

0+阅读 · 2023年7月27日

QPLEX: Realizing the Integration of Quantum Computing into Combinatorial Optimization Software

Arxiv

0+阅读 · 2023年7月26日

A Novel Computationally Efficient Group Signature for Anonymous and Secure V2X Communications

Arxiv

0+阅读 · 2023年7月25日

Local Problems on Grids from the Perspective of Distributed Algorithms, Finitary Factors, and Descriptive Combinatorics

Arxiv

0+阅读 · 2023年7月25日

Learning Optimal Fair Classification Trees: Trade-offs Between Interpretability, Fairness, and Accuracy

Arxiv

0+阅读 · 2023年7月25日

Transfer Learning With Efficient Estimators to Optimally Leverage Historical Data in Analysis of Randomized Trials

Arxiv

0+阅读 · 2023年7月25日

Estimation and Inference for Multivariate Continuous-time Autoregressive Processes

Arxiv

0+阅读 · 2023年7月24日

Federated Causal Inference in Heterogeneous Observational Data

Arxiv

24+阅读 · 2021年8月10日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Arxiv

12+阅读 · 2021年2月21日

A Survey on Edge Intelligence

A Survey on Edge Intelligence

Arxiv

52+阅读 · 2020年3月26日

VIP会员

文章信息

相关主题

最新内容

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

0+阅读 · 15分钟前

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

0+阅读 · 17分钟前

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

6+阅读 · 今天8:00

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

5+阅读 · 今天7:44

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

4+阅读 · 今天7:28

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

4+阅读 · 今天7:18

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

5+阅读 · 今天7:07

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

4+阅读 · 今天7:03

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

4+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

6+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

10+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

4+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

5+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

8+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

7+阅读 · 6月23日

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

Agentic RL：框架、实践与长程智能体训练

重新思考无人机时代的生存能力

综述 | 从问答到任务完成：Agent系统与Harness设计

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

机器学习研究会

36+阅读 · 2017年12月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

FATROP : A Fast Constrained Optimal Control Problem Solver for Robot Trajectory Optimization and Control

Arxiv

0+阅读 · 2023年7月27日

QPLEX: Realizing the Integration of Quantum Computing into Combinatorial Optimization Software

Arxiv

0+阅读 · 2023年7月26日

A Novel Computationally Efficient Group Signature for Anonymous and Secure V2X Communications

Arxiv

0+阅读 · 2023年7月25日

Local Problems on Grids from the Perspective of Distributed Algorithms, Finitary Factors, and Descriptive Combinatorics

Arxiv

0+阅读 · 2023年7月25日

Learning Optimal Fair Classification Trees: Trade-offs Between Interpretability, Fairness, and Accuracy

Arxiv

0+阅读 · 2023年7月25日

Transfer Learning With Efficient Estimators to Optimally Leverage Historical Data in Analysis of Randomized Trials

Arxiv

0+阅读 · 2023年7月25日

Estimation and Inference for Multivariate Continuous-time Autoregressive Processes

Arxiv

0+阅读 · 2023年7月24日

Federated Causal Inference in Heterogeneous Observational Data

Arxiv

24+阅读 · 2021年8月10日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Arxiv

12+阅读 · 2021年2月21日

A Survey on Edge Intelligence

A Survey on Edge Intelligence

Arxiv

52+阅读 · 2020年3月26日

相关基金

单原子催化剂M1/TiO2(M=Cu,Ag,Au,Pt,Pd,Ir)催化甲胺裂解的理论研究

国家自然科学基金

0+阅读 · 2015年12月31日

广义透孢黑团壳属Massarina分类与分子系统学研究

国家自然科学基金

0+阅读 · 2015年12月31日

国产盆距兰属(Gastrochilus)的分类修订

国家自然科学基金

0+阅读 · 2015年12月31日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

TMOD1调节actin聚合影响胰岛素信号转导的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

叶绿体类核区蛋白PUC1对PEP类型基因表达的分子调节机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

AMPKα1调节介导Ca2+内流对高糖诱导内皮细胞调亡的影响及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

根际促生菌Bacillus amyloliquefaciens SQR9与植物根系分泌物互作的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

GSK-3β调控血管平滑肌细胞特异性转录因子Myocardin对动脉粥样硬化斑块形成作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

活化的蛋白激酶C1受体在脂联素受体1介导的信号转导及能量代谢中作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员