Probabilistic computers built from p-bits offer a promising path for combinatorial optimization, but the dense connectivity required by real-world problems scales poorly in hardware. Here, we address this through graph sparsification with auxiliary copy variables and demonstrate two fully on-chip parallel tempering solvers on an FPGA. Targeting MIMO detection, a dense, NP-hard problem central to wireless communications, we first fit 11 temperature replicas of a 128-node sparsified system (1,408 p-bits) on-chip and achieve bit error rates significantly below conventional linear detectors on $64 \times 64$ BPSK MIMO. We report complete end-to-end solution times of 3~ms per instance, including all loading, sampling, readout, and verification overheads. ASIC projections in 7~nm technology indicate 103~MHz operation at 285.8~mW, suggesting that massive parallelism across multiple chips could approach the throughput demands of next-generation wireless systems. Sparsification, however, introduces a sharp sensitivity to the copy-constraint strength $P$ that requires manual tuning. To eliminate this bottleneck, we utilize Two-Dimensional Parallel Tempering (2D-PT), which exchanges replicas across both temperature ($β$) and constraint ($P$) dimensions. On Sherrington--Kirkpatrick spin glasses, 2D-PT converges roughly $250\times$ faster than optimally tuned 1D-PT, and on $128 \times 128$ MIMO it reaches zero bit errors at high SNR where 1D-PT exhibits an error floor. We further validate 2D-PT entirely on-chip with 54 replicas (1,728 p-bits) on a $16 \times 16$ MIMO instance, where it tracks the maximum-likelihood bound in just 50 Monte Carlo steps -- $10\times$ fewer than 1D-PT -- at projected 111~MHz and 124~mW in 7~nm. Together, these results establish an on-chip p-bit architecture and a scalable, tuning-free algorithmic framework for dense combinatorial optimization.
翻译:基于p比特构建的概率计算机为组合优化提供了一条有前景的路径,但实际应用问题所需的高密度连接在硬件实现中扩展性较差。本文通过引入辅助复制变量的图稀疏化方法解决该问题,并在FPGA上展示了两种全片上并行退火求解器。针对无线通信中典型的稠密NP难问题——MIMO检测,我们首先在芯片上部署了128节点稀疏化系统(1408个p比特)的11个温度副本,并在$64 \times 64$ BPSK MIMO系统中取得了显著优于传统线性检测器的误码率。我们报告了完整的端到端求解时间(含加载、采样、读出和验证开销)为每次例化3~毫秒。基于7~nm工艺的ASIC投影表明,在285.8~mW功耗下可实现103~MHz工作频率,这表明多芯片间的海量并行有望接近下一代无线系统的吞吐量需求。然而,稀疏化方法对副本约束强度$P$的取值极为敏感,需要人工调整。为消除该瓶颈,我们采用二维并行退火(2D-PT)算法,该算法在温度($β$)和约束($P$)两个维度上同时交换副本。在Sherrington–Kirkpatrick自旋玻璃模型上,2D-PT的收敛速度约为最优调参1D-PT的250倍;在$128 \times 128$ MIMO系统中,1D-PT在高信噪比时存在错误平层,而2D-PT可实现零误码。我们进一步在$16 \times 16$ MIMO实例上实现了54个副本(1728个p比特)的全片上2D-PT验证,仅需50次蒙特卡洛步(比1D-PT少10倍)即可逼近最大似然界,7~nm工艺下投影工作频率为111~MHz、功耗124~mW。这些成果共同为密集组合优化问题构建了片上p比特架构和可扩展、免调参的算法框架。