High-performance classical simulator for quantum circuits, in particular the tensor network contraction algorithm, has become an important tool for the validation of noisy quantum computing. In order to address the memory limitations, the slicing technique is used to reduce the tensor dimensions, but it could also lead to additional computation overhead that greatly slows down the overall performance. This paper proposes novel lifetime-based methods to reduce the slicing overhead and improve the computing efficiency, including an interpretation method to deal with slicing overhead, an in-place slicing strategy to find the smallest slicing set and an adaptive tensor network contraction path refiner customized for Sunway architecture. Experiments show that in most cases the slicing overhead with our in-place slicing strategy would be less than the cotengra, which is the most used graph path optimization software at present. Finally, the resulting simulation time is reduced to 96.1s for the Sycamore quantum processor RQC, with a sustainable single-precision performance of 308.6Pflops using over 41M cores to generate 1M correlated samples, which is more than 5 times performance improvement compared to 60.4 Pflops in 2021 Gordon Bell Prize work.
翻译:高性能量子电路经典模拟器,特别是张量网络收缩算法,已成为验证含噪量子计算的重要工具。为解决内存限制问题,切片技术被用于降低张量维度,但这可能导致额外计算开销,严重拖慢整体性能。本文提出新颖的基于生命周期的方法以降低切片开销并提升计算效率,包括一种处理切片开销的解释方法、一种寻找最小切片集的原位切片策略,以及一种针对神威架构定制的自适应张量网络收缩路径优化器。实验表明,在大多数情况下,我们的原位切片策略的切片开销低于当前最常用的图路径优化软件cotengra。最终,在Sycamore量子处理器RQC上,模拟时间缩短至96.1秒,并使用超过4100万核生成100万相关样本,实现了308.6 Pflops的可持续单精度性能,相比2021年戈登贝尔奖工作的60.4 Pflops提升了5倍以上。