利用长向量优化CFD代码：协同设计案例研究 (Exploiting long vectors with a CFD code: a co-design show case) - 专知论文

会员服务 ·

0

向量化 · 代码 · CASE · 编译器 · Performer ·

2024 年 10 月 27 日

Exploiting long vectors with a CFD code: a co-design show case

翻译：利用长向量优化CFD代码：协同设计案例研究

Marc Blancafort,Roger Ferrer,Guillaume Houzeaux,Marta Garcia-Gasulla,Filippo Mantovani

from arxiv, Main track paper, presented at IPDPS 2024

A current trend in HPC systems is the utilization of architectures with SIMD or vector extensions to exploit data parallelism. There are several ways to take advantage of such modern vector architectures, each with a different impact on the code and its portability. For example, the use of intrinsics, guided vectorization via pragmas, or compiler autovectorization. Our objectives are to maximize vectorization efficiency and minimize code specialization. To achieve these objectives, we rely on compiler autovectorization. We leverage a set of hardware and software tools that allow us to analyze in detail where autovectorization is suboptimal. Thus, we apply an iterative methodology that allows us to incrementally improve the efficient use of the underlying hardware. In this paper, we apply this methodology to a CFD production code. We evaluate the performance on an innovative configurable platform powered by a RISC-V core coupled with a wide vector unit capable of operating with up to 256 double precision elements. Following the vectorization process, we demonstrate a single-core speedup of 7.6$\times$ compared to its scalar implementation. Furthermore, we show that code portability is not compromised, as our solution continues to exhibit performance benefits, or at the very least, no drawbacks, on other HPC architectures such as Intel x86 and NEC SX-Aurora.

翻译：当前高性能计算系统的发展趋势是利用具有SIMD或向量扩展的架构来开发数据并行性。利用此类现代向量架构存在多种方式，每种方式对代码及其可移植性产生不同影响，例如使用内联函数、通过编译指示引导向量化或编译器自动向量化。我们的目标是最大化向量化效率并最小化代码特化。为实现这些目标，我们依托编译器自动向量化技术，并借助一套硬件与软件工具集来精确分析自动向量化的次优环节。通过采用迭代式方法论，我们能够逐步提升底层硬件的利用效率。本文将这一方法论应用于计算流体力学生产代码，并在基于RISC-V核心的创新可配置平台上进行性能评估，该平台搭载的宽向量单元可支持多达256个双精度元素运算。经过向量化优化后，相较于标量实现版本，我们实现了单核7.6$\times$的加速比。此外，我们的解决方案在保持代码可移植性的同时，在其他高性能计算架构（如Intel x86和NEC SX-Aurora）上仍能保持性能优势或至少无性能损失。

0

相关内容

向量化

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

AI研习社

33+阅读 · 2019年4月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

直接优化半周长线长的VLSI两阶段迭代布局算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

PPP项目争端谈判及其治理机制研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

关于面板(纵向）数据的动态统计分析

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

基于UGC的应急响应决策支持系统关键技术研究

国家自然科学基金

13+阅读 · 2014年12月31日

Exploiting ray tracing technology through OptiX to compute particle interactions with cutoff in a 3D environment on GPU

Arxiv

1+阅读 · 2024年12月16日

Wasserstein Bounds for generative diffusion models with Gaussian tail targets

Arxiv

1+阅读 · 2024年12月15日

Mathematical models for nonlinear ultrasound contrast imaging with microbubbles

Arxiv

1+阅读 · 2024年12月14日

Relational event models with global covariates

Arxiv

1+阅读 · 2024年12月14日

On the asymptotic properties of product-PCA under the high-dimensional setting

Arxiv

1+阅读 · 2024年12月14日

One world, one opinion? The superstar effect in LLM responses

Arxiv

1+阅读 · 2024年12月13日

A unified convergence analysis framework of the energy-stable ETDRK3 schemes for the No-slope-selection thin film model

Arxiv

1+阅读 · 2024年12月13日

Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations

Arxiv

1+阅读 · 2024年12月12日

Beyond forecast leaderboards: Measuring individual model importance based on contribution to ensemble accuracy

Arxiv

1+阅读 · 2024年12月12日

Fine-grained Entity Typing via Label Reasoning

Arxiv

12+阅读 · 2021年9月13日

VIP会员

文章信息

相关主题

相关VIP内容

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

论学习、公平性与复杂度

《整合杀伤链：一个用于边缘目标验证与战术推理的零样本框架》最新资料

2025中国人工智能学会系列白皮书⸺棋盘上的人工智能|附下载

通用智能体评估的逻辑架构

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

AI研习社

33+阅读 · 2019年4月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

Exploiting ray tracing technology through OptiX to compute particle interactions with cutoff in a 3D environment on GPU

Arxiv

1+阅读 · 2024年12月16日

Wasserstein Bounds for generative diffusion models with Gaussian tail targets

Arxiv

1+阅读 · 2024年12月15日

Mathematical models for nonlinear ultrasound contrast imaging with microbubbles

Arxiv

1+阅读 · 2024年12月14日

Relational event models with global covariates

Arxiv

1+阅读 · 2024年12月14日

On the asymptotic properties of product-PCA under the high-dimensional setting

Arxiv

1+阅读 · 2024年12月14日

One world, one opinion? The superstar effect in LLM responses

Arxiv

1+阅读 · 2024年12月13日

A unified convergence analysis framework of the energy-stable ETDRK3 schemes for the No-slope-selection thin film model

Arxiv

1+阅读 · 2024年12月13日

Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations

Arxiv

1+阅读 · 2024年12月12日

Beyond forecast leaderboards: Measuring individual model importance based on contribution to ensemble accuracy

Arxiv

1+阅读 · 2024年12月12日

Fine-grained Entity Typing via Label Reasoning

Arxiv

12+阅读 · 2021年9月13日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

直接优化半周长线长的VLSI两阶段迭代布局算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

PPP项目争端谈判及其治理机制研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

关于面板(纵向）数据的动态统计分析

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

基于UGC的应急响应决策支持系统关键技术研究

国家自然科学基金

13+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员