This paper proposes two quantum operation scheduling methods for accelerating parallel state-vector-based quantum circuit simulation using multiple graphics processing units (GPUs). The proposed methods reduce all-to-all communication caused by qubit reordering (QR), which can dominate the overhead of parallel simulation. Our approach eliminates redundant QRs by introducing intentional delays in QR communications such that multiple QRs can be aggregated into a single QR. The delays are carefully introduced based on the principles of time-space tiling, or a cache optimization technique for classical computers, which we use to arrange the execution order of quantum operations. Moreover, we present an extended scheduling method for the hierarchical interconnection of GPU cluster systems to avoid slow inter-node communication. We develop these methods tailored for two primary procedures in variational quantum eigensolver (VQE) simulation: quantum state update (QSU) and expectation value computation (EVC). Experimental validation on 32-GPU executions demonstrates acceleration in QSU and EVC -- up to 54$\times$ and 606$\times$, respectively -- compared to existing methods. Moreover, our extended scheduling method further reduced communication time by up to 15\% in a two-layered interconnected cluster system. Our approach is useful for any quantum circuit simulations, including QSU and/or EVC.
翻译:本文提出了两种量子操作调度方法,用于加速基于多图形处理器(GPU)的并行态向量量子电路模拟。所提出的方法减少了由量子比特重排序(QR)引起的全对全通信,这种通信可能主导并行模拟的开销。我们的方法通过在QR通信中引入有意的延迟来消除冗余QR,使得多个QR可以聚合为单个QR。延迟的引入基于时空分块原理,或经典计算机的缓存优化技术,我们利用该技术来安排量子操作的执行顺序。此外,我们提出了一种扩展调度方法,用于GPU集群系统的分层互连,以避免缓慢的节点间通信。我们针对变分量子本征求解器(VQE)模拟中的两个主要过程——量子态更新(QSU)和期望值计算(EVC)——开发了这些定制方法。在32-GPU执行上的实验验证表明,与现有方法相比,QSU和EVC分别实现了高达54倍和606倍的加速。此外,我们的扩展调度方法在双层互连集群系统中进一步将通信时间减少了高达15%。我们的方法适用于任何包含QSU和/或EVC的量子电路模拟。