Ising machines are specialized computers for finding the lowest energy states of Ising spin models, onto which many practical combinatorial optimization problems can be mapped. Simulated bifurcation (SB) is a quantum-inspired parallelizable algorithm for Ising problems that enables scalable multi-chip implementations of Ising machines. However, the computational performance of a previously proposed multi-chip architecture tends to saturate as the number of chips increases for a given problem size because both computation and communication are exclusive in the time domain. In this paper, we propose a streaming architecture for multi-chip implementations of SB-based Ising machines with full spin-to-spin connectivity. The data flow in in-chip computation is harmonized with the data flow in inter-chip communication, enabling the computation and communication to overlap and the communication time to be hidden. Systematic experiments demonstrate linear strong scaling of performance up to the vicinity of the ideal communication limit determined only by the latency of chip-to-chip communication. Our eight-FPGA (field-programmable gate array) cluster can compute a 32,768-spin problem with a high pipeline efficiency of 97.9%. The performance of a 79-FPGA cluster for a 100,000-spin problem, projected using a theoretical performance model validated on smaller experimental clusters, is comparable to that of a state-of-the-art 100,000-spin optical Ising machine.
翻译:伊辛机是专用于寻找伊辛自旋模型最低能态的特化计算机,众多实际组合优化问题均可映射至该模型。模拟分岔(Simulated Bifurcation, SB)是一种受量子启发的可并行化伊辛问题求解算法,支持伊辛机的可扩展多芯片实现。然而,对于给定问题规模,先前提出的多芯片架构的计算性能会随芯片数量增加而趋于饱和,这是由于计算与通信在时域上相互排斥所致。本文提出一种面向全自旋连接SB型伊辛机多芯片实现的流式架构。片内计算的数据流与片间通信的数据流实现协同,使计算与通信能够重叠执行,从而隐藏通信时延。系统实验表明,在仅受芯片间通信延迟限制的理想通信极限附近,性能呈现线性强扩展性。我们的八FPGA(现场可编程门阵列)集群能以97.9%的高流水线效率处理32768自旋问题。基于小规模实验集群验证的理论性能模型预测,面向十万自旋问题的79-FPGA集群,其性能可媲美当前最先进的十万自旋光学伊辛机。