GEMMFIP: Unifying GEMM in BLIS - 专知论文

会员服务 ·

0

Performer · 评论员 · Packing · 操作 · prototype ·

2023 年 2 月 17 日

GEMMFIP: Unifying GEMM in BLIS

翻译：GEMMFIP: BLIS框架下的通用矩阵乘法统一实现

RuQing G. Xu,Field G. Van Zee,Robert A. van de Geijn

from arxiv, 16 pages, 7 figures, 2 algorithms

Matrix libraries often focus on achieving high performance for problems considered to be either "small" or "large", as these two scenarios tend to respond best to different optimization strategies. We propose a unified technique for implementing matrix operations like general matrix multiplication (GEMM) that can achieve high performance for both small and large problem sizes. The key is to fuse packing -- an operation that copies data to a contiguous layout in memory and which is critical for large matrix performance -- with the first computational "pass" over that data. This boosts performance across the problem size spectrum. As a result, tuning general-purpose libraries becomes simpler since it obviates the need to carefully express and parameterize logic that chooses between a "small matrix" strategy and a "large matrix" strategy. A prototype implementation of the technique built with the BLAS-like Library Instantiation Software (BLIS) framework is described and performance on a range of architectures is reported.

翻译：矩阵库通常专注于为“小规模”或“大规模”问题实现高性能，因为这两种场景往往最适合不同的优化策略。我们提出一种统一的矩阵运算实现技术（如通用矩阵乘法GEMM），能在小规模与大规模问题上均获得高性能。关键在于将数据打包操作（将数据复制到内存中连续布局的操作，对大规模矩阵性能至关重要）与该数据的首次计算“遍次”融合。这一方法提升了全问题规模谱系的性能。由此，调优通用库变得更加简单，因为它无需精心设计并参数化选择“小矩阵策略”与“大矩阵策略”的逻辑。我们描述了基于类BLAS库实例化软件（BLIS）框架构建的原型实现，并报告了其在多种架构上的性能表现。

0

相关内容

Performer

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

434+阅读 · 2021年1月11日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Alpha稳定分布环境下的非圆信号波达方向估计方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

雷公藤甲素诱导急性早幼粒白血病细胞凋亡及自噬的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

质子泵抑制剂下调ATP6V1A抑制自噬影响胃腺癌多药耐药的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于非均匀网格的Helmholtz方程的优化差分法及其预处理迭代算法

国家自然科学基金

0+阅读 · 2013年12月31日

基于概率化SC文法的多策略机器翻译研究

国家自然科学基金

0+阅读 · 2012年12月31日

不同垒层厚度并掺杂的GaNAs基短周期超晶格太阳能电池与MBE生长研究

国家自然科学基金

0+阅读 · 2012年12月31日

La1-xSrxMnO3/In-MgZnO全氧化物外延异质结器件的制备与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向大规模优化问题的基于云计算模型的协同差分进化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

三维片上网络存储体系结构研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于FPGA组的射电望远镜后端超宽带信号处理技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

(Almost) Ruling Out SETH Lower Bounds for All-Pairs Max-Flow

Arxiv

0+阅读 · 2023年4月10日

Preconditioned geometric iterative methods for cubic B-spline interpolation curves

Arxiv

0+阅读 · 2023年4月10日

Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training

Arxiv

0+阅读 · 2023年4月6日

Unifying Linearity and Dependency Analyses

Arxiv

0+阅读 · 2023年4月6日

Bridge Girth: A Unifying Notion in Network Design

Arxiv

0+阅读 · 2023年4月4日

Data-graph repairs: the preferred approach

Arxiv

0+阅读 · 2023年4月3日

Parameter-free preconditioning for nearly-incompressible linear elasticity

Arxiv

0+阅读 · 2023年3月31日

Distributed Graph Neural Network Training: A Survey

Arxiv

16+阅读 · 2022年11月1日

How Framelets Enhance Graph Neural Networks

Arxiv

21+阅读 · 2021年2月13日

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

Arxiv

17+阅读 · 2017年12月12日

VIP会员

文章信息

相关主题

最新内容

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

8+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

3+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

7+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

7+阅读 · 7月22日

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

15+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

12+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

4+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

6+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

9+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

7+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

9+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

8+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

10+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

9+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

10+阅读 · 7月20日

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

434+阅读 · 2021年1月11日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

对抗环境下超视距目标打击的情报支援

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

(Almost) Ruling Out SETH Lower Bounds for All-Pairs Max-Flow

Arxiv

0+阅读 · 2023年4月10日

Preconditioned geometric iterative methods for cubic B-spline interpolation curves

Arxiv

0+阅读 · 2023年4月10日

Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training

Arxiv

0+阅读 · 2023年4月6日

Unifying Linearity and Dependency Analyses

Arxiv

0+阅读 · 2023年4月6日

Bridge Girth: A Unifying Notion in Network Design

Arxiv

0+阅读 · 2023年4月4日

Data-graph repairs: the preferred approach

Arxiv

0+阅读 · 2023年4月3日

Parameter-free preconditioning for nearly-incompressible linear elasticity

Arxiv

0+阅读 · 2023年3月31日

Distributed Graph Neural Network Training: A Survey

Arxiv

16+阅读 · 2022年11月1日

How Framelets Enhance Graph Neural Networks

Arxiv

21+阅读 · 2021年2月13日

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

Arxiv

17+阅读 · 2017年12月12日

相关基金

Alpha稳定分布环境下的非圆信号波达方向估计方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

雷公藤甲素诱导急性早幼粒白血病细胞凋亡及自噬的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

质子泵抑制剂下调ATP6V1A抑制自噬影响胃腺癌多药耐药的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于非均匀网格的Helmholtz方程的优化差分法及其预处理迭代算法

国家自然科学基金

0+阅读 · 2013年12月31日

基于概率化SC文法的多策略机器翻译研究

国家自然科学基金

0+阅读 · 2012年12月31日

不同垒层厚度并掺杂的GaNAs基短周期超晶格太阳能电池与MBE生长研究

国家自然科学基金

0+阅读 · 2012年12月31日

La1-xSrxMnO3/In-MgZnO全氧化物外延异质结器件的制备与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向大规模优化问题的基于云计算模型的协同差分进化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

三维片上网络存储体系结构研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于FPGA组的射电望远镜后端超宽带信号处理技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员