Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection - 专知论文

会员服务 ·

0

Tensor · 值域 · 查准率/准确率 · 评论员 · 逼真度 ·

2023 年 3 月 15 日

Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection

翻译：通过张量核心上的SGEMM仿真与自动精度选择的量子电路模拟

Hiryuki Ootomo,Hidetaka Manabe,Kenji Harada,Rio Yokota

from arxiv, This paper has been accepted by ISC'23

Quantum circuit simulation provides the foundation for the development of quantum algorithms and the verification of quantum supremacy. Among the various methods for quantum circuit simulation, tensor network contraction has been increasing in popularity due to its ability to simulate a larger number of qubits. During tensor contraction, the input tensors are reshaped to matrices and computed by a GEMM operation, where these GEMM operations could reach up to 90\% of the total calculation time. GEMM throughput can be improved by utilizing mixed-precision hardware such as Tensor Cores, but straightforward implementation results in insufficient fidelity for deep and large quantum circuits. Prior work has demonstrated that compensated summation with special care of the rounding mode can fully recover the FP32 precision of SGEMM even when using TF32 or FP16 Tensor Cores. The exponent range is a critical issue when applying such techniques to quantum circuit simulation. While TF32 supports almost the same exponent range as FP32, FP16 supports a much smaller exponent range. In this work, we use the exponent range statistics of input tensor elements to select which Tensor Cores we use for the GEMM. We evaluate our method on Random Circuit Sampling (RCS), including Sycamore's quantum circuit, and show that the throughput is 1.86 times higher at maximum while maintaining accuracy.

翻译：量子电路模拟为量子算法的发展与量子优越性的验证提供了基础。在众多量子电路模拟方法中，张量网络收缩因其能够模拟更多量子比特而日益普及。在进行张量收缩时，输入张量被重塑为矩阵并通过GEMM操作计算，这些GEMM操作可占总体计算时间的90%以上。利用混合精度硬件（如张量核心）可提升GEMM吞吐量，但直接实现会导致深层次大规模量子电路的保真度不足。已有研究表明，通过特殊处理舍入模式的补偿求和法，即使使用TF32或FP16张量核心也能完全恢复SGEMM的FP32精度。当将该技术应用于量子电路模拟时，指数范围成为关键问题：TF32支持与FP32几乎相同的指数范围，而FP16的指数范围则小得多。在本工作中，我们利用输入张量元素的指数范围统计特性来选择用于GEMM的张量核心类型。我们在随机电路采样（RCS）任务（包括Sycamore量子电路）上评估了该方法，结果表明在保持精度的前提下，最大吞吐量提升了1.86倍。

0

相关内容

Tensor

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Purdue电子与计算机工程系李海桐NanoX实验室招收AI硬件全奖博士生（2023秋季）

Purdue电子与计算机工程系李海桐NanoX实验室招收AI硬件全奖博士生（2023秋季）

机器之心

0+阅读 · 2022年10月15日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

蛋白磷酸酶2A在NO供体诱导肝癌细胞凋亡中的调节作用

国家自然科学基金

0+阅读 · 2015年12月31日

非凸稀疏正则化模型与算法的研究

国家自然科学基金

3+阅读 · 2015年12月31日

分数阶微分-代数方程的高精度数值算法

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Mumford-Shah型图像分割问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

电磁场特征值问题的间断 Galerkin 算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

点青霉葡萄糖氧化酶热稳定性关键氨基酸研究

国家自然科学基金

0+阅读 · 2012年12月31日

电弧反演模型、算法及在高压开关中的应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cayley图的匹配可扩性和semi-Cayley图的谱

国家自然科学基金

0+阅读 · 2011年12月31日

TfR抗体和CTX修饰纳米载体介导hTERTC27治疗神经胶质瘤

国家自然科学基金

0+阅读 · 2009年12月31日

Hierarchical quantum circuit representations for neural architecture search

Arxiv

0+阅读 · 2023年5月7日

Efficient Quantized Sparse Matrix Operations on Tensor Cores

Arxiv

0+阅读 · 2023年5月7日

RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible and Energy-Efficient On-Chip Linear Algebra and TinyML Training Acceleration

Arxiv

0+阅读 · 2023年5月6日

On High-dimensional and Low-rank Tensor Bandits

Arxiv

0+阅读 · 2023年5月6日

Topological quantum computation is hyperbolic

Arxiv

0+阅读 · 2023年5月5日

Microarchitectures for Heterogeneous Superconducting Quantum Computers

Arxiv

0+阅读 · 2023年5月5日

The Capacity of Classical Summation over a Quantum MAC with Arbitrarily Distributed Inputs and Entanglements

Arxiv

0+阅读 · 2023年5月4日

Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality

Arxiv

0+阅读 · 2023年5月4日

Speeding up quantum circuits simulation using ZX-Calculus

Arxiv

0+阅读 · 2023年5月4日

Supervisory Control of Quantum Discrete Event Systems

Arxiv

0+阅读 · 2023年5月4日

VIP会员

文章信息

相关主题

查准率/准确率

最新内容

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

2+阅读 · 16分钟前

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

2+阅读 · 48分钟前

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

2+阅读 · 今天13:33

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

2+阅读 · 今天13:30

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

2+阅读 · 今天13:28

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

2+阅读 · 今天13:13

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

《韩国国防政策与军备出口：韩国安全与国防政策如何塑造其国防工业与军备出口格局》最新100页报告

专知会员服务

1+阅读 · 今天13:10

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

5+阅读 · 6月16日

多模态代码智能综述：从视觉输入到可执行代码系统

多模态代码智能综述：从视觉输入到可执行代码系统

专知会员服务

7+阅读 · 6月16日

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

5+阅读 · 6月16日

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

5+阅读 · 6月16日

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

15+阅读 · 6月16日

《通过小型无人机系统将情报能力“作战化”》

《通过小型无人机系统将情报能力“作战化”》

专知会员服务

6+阅读 · 6月16日

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

专知会员服务

10+阅读 · 6月16日

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

专知会员服务

21+阅读 · 6月15日

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

定向能反无人机系统最新发展动态

《短程弹道再入飞行器拦截时间中的一项异常现象》

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

从燃煤战舰到算法战争：水面指挥的永恒要求

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Purdue电子与计算机工程系李海桐NanoX实验室招收AI硬件全奖博士生（2023秋季）

Purdue电子与计算机工程系李海桐NanoX实验室招收AI硬件全奖博士生（2023秋季）

机器之心

0+阅读 · 2022年10月15日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Hierarchical quantum circuit representations for neural architecture search

Arxiv

0+阅读 · 2023年5月7日

Efficient Quantized Sparse Matrix Operations on Tensor Cores

Arxiv

0+阅读 · 2023年5月7日

RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible and Energy-Efficient On-Chip Linear Algebra and TinyML Training Acceleration

Arxiv

0+阅读 · 2023年5月6日

On High-dimensional and Low-rank Tensor Bandits

Arxiv

0+阅读 · 2023年5月6日

Topological quantum computation is hyperbolic

Arxiv

0+阅读 · 2023年5月5日

Microarchitectures for Heterogeneous Superconducting Quantum Computers

Arxiv

0+阅读 · 2023年5月5日

The Capacity of Classical Summation over a Quantum MAC with Arbitrarily Distributed Inputs and Entanglements

Arxiv

0+阅读 · 2023年5月4日

Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality

Arxiv

0+阅读 · 2023年5月4日

Speeding up quantum circuits simulation using ZX-Calculus

Arxiv

0+阅读 · 2023年5月4日

Supervisory Control of Quantum Discrete Event Systems

Arxiv

0+阅读 · 2023年5月4日

相关基金

蛋白磷酸酶2A在NO供体诱导肝癌细胞凋亡中的调节作用

国家自然科学基金

0+阅读 · 2015年12月31日

非凸稀疏正则化模型与算法的研究

国家自然科学基金

3+阅读 · 2015年12月31日

分数阶微分-代数方程的高精度数值算法

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Mumford-Shah型图像分割问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

电磁场特征值问题的间断 Galerkin 算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

点青霉葡萄糖氧化酶热稳定性关键氨基酸研究

国家自然科学基金

0+阅读 · 2012年12月31日

电弧反演模型、算法及在高压开关中的应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cayley图的匹配可扩性和semi-Cayley图的谱

国家自然科学基金

0+阅读 · 2011年12月31日

TfR抗体和CTX修饰纳米载体介导hTERTC27治疗神经胶质瘤

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员