Accelerating 128-bit Floating-Point Matrix Multiplication on FPGAs - 专知论文

会员服务 ·

0

Performer · 模型评估 · FPGA · 设计 · 查准率/准确率 ·

2023 年 6 月 7 日

Accelerating 128-bit Floating-Point Matrix Multiplication on FPGAs

翻译：在FPGA上加速128位浮点矩阵乘法

Fumiya Kono,Naohito Nakasato,Maho Nakata

from arxiv, 12 pages, 8 figures

General Matrix Multiplication (GEMM) is a fundamental operation widely used in scientific computations. Its performance and accuracy significantly impact the performance and accuracy of applications that depend on it. One such application is semidefinite programming (SDP), and it often requires binary128 or higher precision arithmetic to solve problems involving SDP stably. However, only some processors support binary128 arithmetic, which makes SDP solvers generally slow. In this study, we focused on accelerating GEMM with binary128 arithmetic on field-programmable gate arrays (FPGAs) to enable the flexible design of accelerators for the desired computations. Our binary128 GEMM designs on a recent high-performance FPGA achieved approximately 90GFlops, 147x faster than the computation executed on a recent CPU with 20 threads for large matrices. Using our binary128 GEMM design on the FPGA, we successfully accelerated two numerical applications: LU decomposition and SDP problems, for the first time.

翻译：通用矩阵乘法（GEMM）是科学计算中广泛使用的基础运算，其性能与精度显著影响依赖该运算的应用表现。例如半定规划（SDP）问题需要采用binary128或更高精度的算术运算以确保稳定性，然而仅有少数处理器支持binary128算术运算，导致SDP求解器通常运行缓慢。本研究聚焦于在FPGA（现场可编程门阵列）上加速binary128算术的GEMM运算，从而为目标计算设计灵活的加速器。我们在新型高性能FPGA上实现的binary128 GEMM设计，对于大尺寸矩阵可实现约90GFlops的算力，较采用20线程的现代CPU快147倍。通过将FPGA上的binary128 GEMM设计应用于数值计算，我们首次成功加速了LU分解与SDP问题两类数值应用。

0

相关内容

Performer

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

miR-5591靶向AGER/ROS/JNK抑制MSCs氧化应激损伤在糖尿病创面修复中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于虚拟并发数据流和有效控制数据驱动的无线网络化系统协同设计

国家自然科学基金

0+阅读 · 2014年12月31日

面向逆时偏移算法的FPGA加速技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

虚拟化环境中高效节能的内存资源动态管理技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于集群OFDM的低功耗电力线通信收发端设计

国家自然科学基金

0+阅读 · 2013年12月31日

面向基于图的数据挖掘的FPGA加速方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

异构众核处理器自适应任务调度理论与方法

国家自然科学基金

2+阅读 · 2012年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

异步低功耗LDPC解码器设计

国家自然科学基金

0+阅读 · 2009年12月31日

Unsupervised ANN-Based Equalizer and Its Trainable FPGA Implementation

Arxiv

0+阅读 · 2023年7月28日

No-Regret Constrained Bayesian Optimization of Noisy and Expensive Hybrid Models using Differentiable Quantile Function Approximations

Arxiv

0+阅读 · 2023年7月28日

Solving Data Quality Problems with Desbordante: a Demo

Arxiv

0+阅读 · 2023年7月27日

Quantum Computer Simulations at Warp Speed: Assessing the Impact of GPU Acceleration

Arxiv

0+阅读 · 2023年7月27日

Semi-Grant-Free Orthogonal Multiple Access with Partial-Information for Short Packet Transmissions

Arxiv

0+阅读 · 2023年7月27日

Accelerating Polynomial Modular Multiplication with Crossbar-Based Compute-in-Memory

Arxiv

0+阅读 · 2023年7月27日

High-Speed Area-Efficient Hardware Architecture for the Efficient Detection of Faults in a Bit-Parallel Multiplier Utilizing the Polynomial Basis of GF(2m)

Arxiv

0+阅读 · 2023年7月26日

A Large-Scale Feasibility Study of Screen-based 3D Visualization and Augmented Reality Tools for Human Anatomy Education: Exploring Gender Perspectives in Learning Experience

Arxiv

0+阅读 · 2023年7月26日

Teal: Learning-Accelerated Optimization of WAN Traffic Engineering

Arxiv

0+阅读 · 2023年7月26日

A Novel Computationally Efficient Group Signature for Anonymous and Secure V2X Communications

Arxiv

0+阅读 · 2023年7月25日

VIP会员

文章信息

相关主题

查准率/准确率

最新内容

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

9+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

8+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

3+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

5+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

6+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

5+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

7+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

8+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

7+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

9+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

8+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

15+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

8+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

10+阅读 · 7月19日

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

《无人机对海面作战影响评估》

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Unsupervised ANN-Based Equalizer and Its Trainable FPGA Implementation

Arxiv

0+阅读 · 2023年7月28日

No-Regret Constrained Bayesian Optimization of Noisy and Expensive Hybrid Models using Differentiable Quantile Function Approximations

Arxiv

0+阅读 · 2023年7月28日

Solving Data Quality Problems with Desbordante: a Demo

Arxiv

0+阅读 · 2023年7月27日

Quantum Computer Simulations at Warp Speed: Assessing the Impact of GPU Acceleration

Arxiv

0+阅读 · 2023年7月27日

Semi-Grant-Free Orthogonal Multiple Access with Partial-Information for Short Packet Transmissions

Arxiv

0+阅读 · 2023年7月27日

Accelerating Polynomial Modular Multiplication with Crossbar-Based Compute-in-Memory

Arxiv

0+阅读 · 2023年7月27日

High-Speed Area-Efficient Hardware Architecture for the Efficient Detection of Faults in a Bit-Parallel Multiplier Utilizing the Polynomial Basis of GF(2m)

Arxiv

0+阅读 · 2023年7月26日

A Large-Scale Feasibility Study of Screen-based 3D Visualization and Augmented Reality Tools for Human Anatomy Education: Exploring Gender Perspectives in Learning Experience

Arxiv

0+阅读 · 2023年7月26日

Teal: Learning-Accelerated Optimization of WAN Traffic Engineering

Arxiv

0+阅读 · 2023年7月26日

A Novel Computationally Efficient Group Signature for Anonymous and Secure V2X Communications

Arxiv

0+阅读 · 2023年7月25日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

miR-5591靶向AGER/ROS/JNK抑制MSCs氧化应激损伤在糖尿病创面修复中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于虚拟并发数据流和有效控制数据驱动的无线网络化系统协同设计

国家自然科学基金

0+阅读 · 2014年12月31日

面向逆时偏移算法的FPGA加速技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

虚拟化环境中高效节能的内存资源动态管理技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于集群OFDM的低功耗电力线通信收发端设计

国家自然科学基金

0+阅读 · 2013年12月31日

面向基于图的数据挖掘的FPGA加速方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

异构众核处理器自适应任务调度理论与方法

国家自然科学基金

2+阅读 · 2012年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

异步低功耗LDPC解码器设计

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员