Tensor networks establish an adaptable framework for the emulation of quantum circuits. By partitioning exponentially large registers and gates into smaller tensors, this unlocks fast transformations through tensor algebra, and grants fine control over memory, runtime and accuracy. Due to inherently lower spatial footprint, there is a gap in distributed-memory tensor network methods. While certain parallel techniques exist, they are usually limited to direct contraction and sampling problems, and a more general approach is needed for tensor representations like matrix product states (MPS), which efficiently approximate full quantum state evolution. In this study, we expand the MPS site tensors beyond local memory by introducing a tensor-parallel distribution scheme, where individual dense tensors are evenly scattered across a subset of indices. This is further facilitated by leveraging pivoted QR factorisation instead of slower singular value decomposition (SVD). We demonstrate the capabilities of our approach by approximately emulating the classically difficult Google's random circuit sampling (RCS) benchmark. The highest bond dimensions of 16,384 is reached, surpassing the accuracy of the state-of-the-art methods by three orders of magnitude on 32 nodes of ARCHER2. We also show how this helps advance experiments involving more practical quantum phase estimation circuits. Our approach has the potential to enhance numerous algorithms based on dense tensor networks, offering a scalable and naturally load-balanced distribution formula. It is also compatible with other types of parallelism, unlocking new opportunities to push the quantum-classical computational phase boundary.
翻译:张量网络为量子电路的仿真建立了一个灵活的框架。通过将指数级规模的寄存器与量子门划分为更小的张量,该方法能够通过张量代数实现快速变换,并对内存、运行时间和精度进行精细控制。由于分布式内存张量网络方法在空间占用上具有固有优势,目前存在相关技术空白。现有某些并行技术通常局限于直接收缩与采样问题,而针对矩阵乘积态(MPS)这类可高效近似完整量子态演化的张量表示,亟需更通用的方法。本研究通过引入张量并行分布方案,将单个稠密张量均匀分散至部分索引上,突破了MPS位张量仅限于本地内存的限制。该方案进一步借助枢轴QR分解替代较慢的奇异值分解(SVD)加以实现。我们通过近似仿真经典计算困难的谷歌随机电路采样(RCS)基准测试,展示了该方法的能力。在ARCHER2集群的32个节点上,本研究实现了最高16,384的键维数,其精度超越现有最优方法三个数量级。我们还展示了该方法如何推动涉及更实用量子相位估计电路的实验进展。该方案具有增强基于稠密张量网络的多种算法的潜力,提供了一种可扩展且天然负载均衡的分布策略,同时兼容其他并行范式,为推进量子-经典计算相变边界开辟了新可能。