Optimization of SpGEMM with Risc-V vector instructions - 专知论文

会员服务 ·

0

稀疏 · 列 · 向量化 · 哈希学习 · 块 ·

2023 年 3 月 4 日

Optimization of SpGEMM with Risc-V vector instructions

翻译：面向 RISC-V 向量指令的 SpGEMM 优化

Valentin Le Fèvre,Marc Casas

The Sparse GEneral Matrix-Matrix multiplication (SpGEMM) $C = A \times B$ is a fundamental routine extensively used in domains like machine learning or graph analytics. Despite its relevance, the efficient execution of SpGEMM on vector architectures is a relatively unexplored topic. The most recent algorithm to run SpGEMM on these architectures is based on the SParse Accumulator (SPA) approach, and it is relatively efficient for sparse matrices featuring several tens of non-zero coefficients per column as it computes C columns one by one. However, when dealing with matrices containing just a few non-zero coefficients per column, the state-of-the-art algorithm is not able to fully exploit long vector architectures when computing the SpGEMM kernel. To overcome this issue we propose the SPA paRallel with Sorting (SPARS) algorithm, which computes in parallel several C columns among other optimizations, and the HASH algorithm, which uses dynamically sized hash tables to store intermediate output values. To combine the efficiency of SPA for relatively dense matrix blocks with the high performance that SPARS and HASH deliver for very sparse matrix blocks we propose H-SPA(t) and H-HASH(t), which dynamically switch between different algorithms. H-SPA(t) and H-HASH(t) obtain 1.24$\times$ and 1.57$\times$ average speed-ups with respect to SPA respectively, over a set of 40 sparse matrices obtained from the SuiteSparse Matrix Collection. For the 22 most sparse matrices, H-SPA(t) and H-HASH(t) deliver 1.42$\times$ and 1.99$\times$ average speed-ups respectively.

翻译：稀疏通用矩阵乘法（SpGEMM）$C = A \times B$ 是一种基础运算，广泛应用于机器学习和图分析等领域。尽管其重要性，SpGEMM 在向量架构上的高效执行仍是一个相对未被充分探索的课题。当前在该架构上运行 SpGEMM 的最新算法基于稀疏累加器（Sparse Accumulator, SPA）方法，通过逐列计算 C 矩阵，对于每列包含数十个非零系数的稀疏矩阵效率较高。然而，当处理每列仅含少量非零系数的矩阵时，现有算法无法充分利用长向量架构的计算潜力来执行 SpGEMM 核心运算。为解决此问题，我们提出了并行排序稀疏累加器（SPARS）算法和哈希（HASH）算法：前者通过并行计算多个 C 列及其他优化手段提升性能，后者则采用动态尺寸哈希表存储中间输出值。为结合 SPA 对相对稠密矩阵块的高效性与 SPARS 和 HASH 对极稀疏矩阵块的高性能，我们进一步提出 H-SPA(t) 和 H-HASH(t) 算法，它们可在不同算法间动态切换。基于 SuiteSparse 矩阵集合中 40 个稀疏矩阵的测试表明，相较于 SPA，H-SPA(t) 和 H-HASH(t) 平均加速比分别达 1.24 倍和 1.57 倍。对于其中最稀疏的 22 个矩阵，H-SPA(t) 和 H-HASH(t) 的平均加速比分别提升至 1.42 倍和 1.99 倍。

0

相关内容

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

19+阅读 · 2021年9月17日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

11篇ICLR2020满分文章，来看看他们都在做什么？

11篇ICLR2020满分文章，来看看他们都在做什么？

专知

18+阅读 · 2019年11月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

《数学学报》期刊

国家自然科学基金

5+阅读 · 2015年12月31日

Toll样受体在中药成分保护肠黏膜微血管内皮细胞免受细菌毒素损伤中的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

有机半导体/无机纳晶杂化材料的界面控制及光电性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

海洋弧菌菌群感应信号分子N-acyl homoserine lactones对NK细胞的调控作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型双极性给受体共聚物半导体的设计，合成与光电性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

语音识别中的稀疏性深度学习

国家自然科学基金

11+阅读 · 2012年12月31日

Galectin-7在哮喘发病中的调控以及作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于Junction tree推理的多运动平台分散式协同导航算法研究

国家自然科学基金

2+阅读 · 2012年12月31日

SiO2复合材料表面CNTs生长及与TC4钛合金的复合反应钎焊机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

斑马鱼心脏发育

国家自然科学基金

0+阅读 · 2009年12月31日

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Arxiv

0+阅读 · 2023年4月27日

Pushing the Boundaries of Tractable Multiperspective Reasoning: A Deduction Calculus for Standpoint EL+

Arxiv

0+阅读 · 2023年4月27日

Mixtures of Gaussian process experts based on kernel stick-breaking processes

Arxiv

0+阅读 · 2023年4月26日

An accelerated proximal gradient method for multiobjective optimization

Arxiv

0+阅读 · 2023年4月26日

SCV-GNN: Sparse Compressed Vector-based Graph Neural Network Aggregation

Arxiv

0+阅读 · 2023年4月26日

Splitting physics-informed neural networks for inferring the dynamics of integer- and fractional-order neuron models

Arxiv

0+阅读 · 2023年4月26日

BO-ICP: Initialization of Iterative Closest Point Based on Bayesian Optimization

Arxiv

0+阅读 · 2023年4月25日

Sequential Attention for Feature Selection

Arxiv

0+阅读 · 2023年4月25日

Alternating Local Enumeration (TnALE): Solving Tensor Network Structure Search with Fewer Evaluations

Arxiv

0+阅读 · 2023年4月25日

Determination of the effective cointegration rank in high-dimensional time-series predictive regressions

Arxiv

0+阅读 · 2023年4月25日

VIP会员

文章信息

相关主题

最新内容

AUTOLAB：86亿Token实测前沿模型的长程自动科研能力

AUTOLAB：86亿Token实测前沿模型的长程自动科研能力

专知会员服务

3+阅读 · 6月12日

CVPR 2026趋势报告：视觉AI正在走向世界模型与物理智能，165页ppt

CVPR 2026趋势报告：视觉AI正在走向世界模型与物理智能，165页ppt

专知会员服务

3+阅读 · 6月12日

乌克兰战场背后的新武器

乌克兰战场背后的新武器

专知会员服务

4+阅读 · 6月12日

《信任但需验证：军事决策背景下的大型语言模型品格、能力与控制》2026最新59页报告

《信任但需验证：军事决策背景下的大型语言模型品格、能力与控制》2026最新59页报告

专知会员服务

10+阅读 · 6月12日

未来战争：乌克兰2026年反攻中的作战经验教训 - 新军事战略之“后勤封锁”（中文下载）

未来战争：乌克兰2026年反攻中的作战经验教训 - 新军事战略之“后勤封锁”（中文下载）

专知会员服务

6+阅读 · 6月12日

基于博弈论的陆军人机协同（长文报告）

基于博弈论的陆军人机协同（长文报告）

专知会员服务

10+阅读 · 6月12日

《天气对反无人机系统“探测-跟踪-识别-失效”链路的影响：俄乌战场分析》

《天气对反无人机系统“探测-跟踪-识别-失效”链路的影响：俄乌战场分析》

专知会员服务

9+阅读 · 6月12日

美国陆军航空兵：以愿景引领转型

美国陆军航空兵：以愿景引领转型

专知会员服务

6+阅读 · 6月12日

CVPR 2026教程｜扩散模型原理：连续、离散与实时生成

CVPR 2026教程｜扩散模型原理：连续、离散与实时生成

专知会员服务

5+阅读 · 6月11日

重磅综述｜大模型智能体环境工程：建模、合成、评估与协同演化

重磅综述｜大模型智能体环境工程：建模、合成、评估与协同演化

专知会员服务

6+阅读 · 6月11日

面向特种部队的、以操作员为中心的人工智能决策支持系统框架

面向特种部队的、以操作员为中心的人工智能决策支持系统框架

专知会员服务

8+阅读 · 6月11日

《多域战场上反制小型无人机系统》150页

《多域战场上反制小型无人机系统》150页

专知会员服务

16+阅读 · 6月11日

《基于成果军事教育框架下的军官联合职业军事教育认证程序》2026最新170页

《基于成果军事教育框架下的军官联合职业军事教育认证程序》2026最新170页

专知会员服务

5+阅读 · 6月11日

战场人工智能：增强陆地作战能力的发现与要求

战场人工智能：增强陆地作战能力的发现与要求

专知会员服务

3+阅读 · 6月11日

人工智能赋能指挥所：以人工智能为中心的指挥控制的核心要素

人工智能赋能指挥所：以人工智能为中心的指挥控制的核心要素

专知会员服务

16+阅读 · 6月11日

相关VIP内容

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

19+阅读 · 2021年9月17日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

CVPR 2026趋势报告：视觉AI正在走向世界模型与物理智能，165页ppt

《信任但需验证：军事决策背景下的大型语言模型品格、能力与控制》2026最新59页报告

AUTOLAB：86亿Token实测前沿模型的长程自动科研能力

乌克兰战场背后的新武器

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

11篇ICLR2020满分文章，来看看他们都在做什么？

11篇ICLR2020满分文章，来看看他们都在做什么？

专知

18+阅读 · 2019年11月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

可解释的CNN

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

相关论文

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

Arxiv

0+阅读 · 2023年4月27日

Pushing the Boundaries of Tractable Multiperspective Reasoning: A Deduction Calculus for Standpoint EL+

Arxiv

0+阅读 · 2023年4月27日

Mixtures of Gaussian process experts based on kernel stick-breaking processes

Arxiv

0+阅读 · 2023年4月26日

An accelerated proximal gradient method for multiobjective optimization

Arxiv

0+阅读 · 2023年4月26日

SCV-GNN: Sparse Compressed Vector-based Graph Neural Network Aggregation

Arxiv

0+阅读 · 2023年4月26日

Splitting physics-informed neural networks for inferring the dynamics of integer- and fractional-order neuron models

Arxiv

0+阅读 · 2023年4月26日

BO-ICP: Initialization of Iterative Closest Point Based on Bayesian Optimization

Arxiv

0+阅读 · 2023年4月25日

Sequential Attention for Feature Selection

Arxiv

0+阅读 · 2023年4月25日

Alternating Local Enumeration (TnALE): Solving Tensor Network Structure Search with Fewer Evaluations

Arxiv

0+阅读 · 2023年4月25日

Determination of the effective cointegration rank in high-dimensional time-series predictive regressions

Arxiv

0+阅读 · 2023年4月25日

相关基金

《数学学报》期刊

国家自然科学基金

5+阅读 · 2015年12月31日

Toll样受体在中药成分保护肠黏膜微血管内皮细胞免受细菌毒素损伤中的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

有机半导体/无机纳晶杂化材料的界面控制及光电性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

海洋弧菌菌群感应信号分子N-acyl homoserine lactones对NK细胞的调控作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型双极性给受体共聚物半导体的设计，合成与光电性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

语音识别中的稀疏性深度学习

国家自然科学基金

11+阅读 · 2012年12月31日

Galectin-7在哮喘发病中的调控以及作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于Junction tree推理的多运动平台分散式协同导航算法研究

国家自然科学基金

2+阅读 · 2012年12月31日

SiO2复合材料表面CNTs生长及与TC4钛合金的复合反应钎焊机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

斑马鱼心脏发育

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员