Quantum circuit simulation provides the foundation for the development of quantum algorithms and the verification of quantum supremacy. Among the various methods for quantum circuit simulation, tensor network contraction has been increasing in popularity due to its ability to simulate a larger number of qubits. During tensor contraction, the input tensors are reshaped to matrices and computed by a GEMM operation, where these GEMM operations could reach up to 90\% of the total calculation time. GEMM throughput can be improved by utilizing mixed-precision hardware such as Tensor Cores, but straightforward implementation results in insufficient fidelity for deep and large quantum circuits. Prior work has demonstrated that compensated summation with special care of the rounding mode can fully recover the FP32 precision of SGEMM even when using TF32 or FP16 Tensor Cores. The exponent range is a critical issue when applying such techniques to quantum circuit simulation. While TF32 supports almost the same exponent range as FP32, FP16 supports a much smaller exponent range. In this work, we use the exponent range statistics of input tensor elements to select which Tensor Cores we use for the GEMM. We evaluate our method on Random Circuit Sampling (RCS), including Sycamore's quantum circuit, and show that the throughput is 1.86 times higher at maximum while maintaining accuracy.
翻译:量子电路模拟为量子算法的发展与量子优越性的验证提供了基础。在众多量子电路模拟方法中,张量网络收缩因其能够模拟更多量子比特而日益普及。在进行张量收缩时,输入张量被重塑为矩阵并通过GEMM操作计算,这些GEMM操作可占总体计算时间的90%以上。利用混合精度硬件(如张量核心)可提升GEMM吞吐量,但直接实现会导致深层次大规模量子电路的保真度不足。已有研究表明,通过特殊处理舍入模式的补偿求和法,即使使用TF32或FP16张量核心也能完全恢复SGEMM的FP32精度。当将该技术应用于量子电路模拟时,指数范围成为关键问题:TF32支持与FP32几乎相同的指数范围,而FP16的指数范围则小得多。在本工作中,我们利用输入张量元素的指数范围统计特性来选择用于GEMM的张量核心类型。我们在随机电路采样(RCS)任务(包括Sycamore量子电路)上评估了该方法,结果表明在保持精度的前提下,最大吞吐量提升了1.86倍。