Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors - 专知论文

会员服务 ·

0

cache · 线性的 · 块 · 核化 · Performer ·

2023 年 4 月 27 日

Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

翻译：密集线性代数软件栈在多核处理器上的协同设计

Héctor Martínez,Sandra Catalán,Francisco D. Igual,José R. Herrero,Rafael Rodríguez-Sánchez,Enrique S. Quintana-Ortí

This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level, architecture-dependent kernels in BLAS (Basic Linear Algebra Subprograms). Specifically, we propose customizing the GEMM (general matrix multiplication) kernel, which is invoked from the blocked algorithms for relevant matrix factorizations in LAPACK, to improve performance on modern multicore processors with hierarchical cache memories. To achieve this, we leverage an analytical model to dynamically adapt the cache configuration parameters of the GEMM to the shape of the matrix operands. Additionally, we accommodate a flexible development of architecture-specific micro-kernels that allow us to further improve the utilization of the cache hierarchy. Our experiments on two platforms, equipped with ARM (NVIDIA Carmel, Neon) and x86 (AMD EPYC, AVX2) multi-core processors, demonstrate the benefits of this approach in terms of better cache utilization and, in general, higher performance. However, they also reveal the delicate balance between optimizing for multi-threaded parallelism versus cache usage.

翻译：本文提倡对密集线性代数软件栈进行交织设计，打破LAPACK（线性代数软件包）中高层分块算法与BLAS（基础线性代数子程序）中底层架构相关内核之间的严格界限。具体而言，我们提出定制GEMM（通用矩阵乘法）内核——该内核被LAPACK中相关矩阵分解的分块算法所调用——以提升在具有层次化缓存存储器的现代多核处理器上的性能。为此，我们利用解析模型动态调整GEMM的缓存配置参数以适应矩阵操作数的形状。此外，我们支持灵活开发架构特定的微内核，从而进一步提高缓存层次结构的利用率。我们在配备ARM（NVIDIA Carmel、Neon）和x86（AMD EPYC、AVX2）多核处理器的两个平台上的实验表明，该方法在缓存利用率优化及整体性能提升方面具有优势。然而，实验结果也揭示了在多线程并行化优化与缓存使用之间需要维持微妙的平衡。

0

相关内容

cache

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

42+阅读 · 2022年10月10日

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

玻尔兹曼方程和流体方程中的渐进极限和边界层分析问题

国家自然科学基金

0+阅读 · 2014年12月31日

食品风险残留物快速检测方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

原子层沉积稀土氧化物和硅酸盐纳米复合薄膜硅基MOS电致发光器件的研究

国家自然科学基金

0+阅读 · 2012年12月31日

川滇地区地震动估计及烈度速判新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于聚苯胺/二氧化锡纳米复合材料的电子标签式乙烯气体传感器研究

国家自然科学基金

0+阅读 · 2012年12月31日

理论模拟含氢键二次有机气溶胶的振动光谱

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程的三角形cut-cell自适应有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

超短脉冲激光整形光谱物理及应用

国家自然科学基金

0+阅读 · 2011年12月31日

光晶格中超冷原子的相变和动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

场论与粒子物理中的量子纠缠与退相干

国家自然科学基金

0+阅读 · 2008年12月31日

Analysis-aware defeaturing of complex geometries with Neumann features

Arxiv

0+阅读 · 2023年6月13日

NetGAP: A Graph-Grammar approach for concept design of networked platforms with extra-functional requirements

Arxiv

0+阅读 · 2023年6月13日

Formation-of-Arrays Antenna Technology for High-Throughput Mobile Non-Terrestrial Networks

Arxiv

0+阅读 · 2023年6月13日

Intelligent Multi-channel Meta-imagers for Accelerating Machine Vision

Arxiv

0+阅读 · 2023年6月12日

FADI: Fast Distributed Principal Component Analysis With High Accuracy for Large-Scale Federated Data

Arxiv

0+阅读 · 2023年6月12日

Fast Approximation of Polynomial Zeros and Matrix Eigenvalues

Arxiv

0+阅读 · 2023年6月12日

Intuitive Joint Priors for Bayesian Linear Multilevel Models: The R2D2M2 prior

Arxiv

0+阅读 · 2023年6月11日

Local object crop collision network for efficient simulation of non-convex objects in GPU-based simulators

Arxiv

0+阅读 · 2023年6月10日

Computing Algorithm for an Equilibrium of the Generalized Stackelberg Game

Arxiv

0+阅读 · 2023年6月9日

On games and simulators as a platform for development of artificial intelligence for command and control

On games and simulators as a platform for development of artificial intelligence for command and control

Arxiv

90+阅读 · 2021年10月21日

VIP会员

文章信息

相关主题

最新内容

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

专知会员服务

4+阅读 · 今天8:00

重新思考无人机时代的生存能力

重新思考无人机时代的生存能力

专知会员服务

2+阅读 · 今天7:44

装甲突击旅：现代战争思考、战斗与组织

装甲突击旅：现代战争思考、战斗与组织

专知会员服务

2+阅读 · 今天7:28

在人工智能加速决策环境中拓展OODA循环

在人工智能加速决策环境中拓展OODA循环

专知会员服务

3+阅读 · 今天7:18

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰与伊朗案例研究》

专知会员服务

4+阅读 · 今天7:07

军事欺骗：供作战战术指挥官使用的工具

军事欺骗：供作战战术指挥官使用的工具

专知会员服务

3+阅读 · 今天7:03

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

4+阅读 · 6月23日

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

5+阅读 · 6月23日

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

10+阅读 · 6月23日

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

4+阅读 · 6月23日

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

5+阅读 · 6月23日

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

8+阅读 · 6月23日

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

7+阅读 · 6月23日

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

4+阅读 · 6月23日

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

6+阅读 · 6月22日

相关VIP内容

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

42+阅读 · 2022年10月10日

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

重新思考无人机时代的生存能力

在人工智能加速决策环境中拓展OODA循环

反无人机拦截器训练与运用课程：对美国陆军部队发展的启示

装甲突击旅：现代战争思考、战斗与组织

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Analysis-aware defeaturing of complex geometries with Neumann features

Arxiv

0+阅读 · 2023年6月13日

NetGAP: A Graph-Grammar approach for concept design of networked platforms with extra-functional requirements

Arxiv

0+阅读 · 2023年6月13日

Formation-of-Arrays Antenna Technology for High-Throughput Mobile Non-Terrestrial Networks

Arxiv

0+阅读 · 2023年6月13日

Intelligent Multi-channel Meta-imagers for Accelerating Machine Vision

Arxiv

0+阅读 · 2023年6月12日

FADI: Fast Distributed Principal Component Analysis With High Accuracy for Large-Scale Federated Data

Arxiv

0+阅读 · 2023年6月12日

Fast Approximation of Polynomial Zeros and Matrix Eigenvalues

Arxiv

0+阅读 · 2023年6月12日

Intuitive Joint Priors for Bayesian Linear Multilevel Models: The R2D2M2 prior

Arxiv

0+阅读 · 2023年6月11日

Local object crop collision network for efficient simulation of non-convex objects in GPU-based simulators

Arxiv

0+阅读 · 2023年6月10日

Computing Algorithm for an Equilibrium of the Generalized Stackelberg Game

Arxiv

0+阅读 · 2023年6月9日

On games and simulators as a platform for development of artificial intelligence for command and control

On games and simulators as a platform for development of artificial intelligence for command and control

Arxiv

90+阅读 · 2021年10月21日

相关基金

玻尔兹曼方程和流体方程中的渐进极限和边界层分析问题

国家自然科学基金

0+阅读 · 2014年12月31日

食品风险残留物快速检测方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

原子层沉积稀土氧化物和硅酸盐纳米复合薄膜硅基MOS电致发光器件的研究

国家自然科学基金

0+阅读 · 2012年12月31日

川滇地区地震动估计及烈度速判新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于聚苯胺/二氧化锡纳米复合材料的电子标签式乙烯气体传感器研究

国家自然科学基金

0+阅读 · 2012年12月31日

理论模拟含氢键二次有机气溶胶的振动光谱

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程的三角形cut-cell自适应有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

超短脉冲激光整形光谱物理及应用

国家自然科学基金

0+阅读 · 2011年12月31日

光晶格中超冷原子的相变和动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

场论与粒子物理中的量子纠缠与退相干

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员