Probabilistic Computers for MIMO Detection: From Sparsification to 2D Parallel Tempering

Probabilistic computers built from p-bits offer a promising path for combinatorial optimization, but the dense connectivity required by real-world problems scales poorly in hardware. Here, we address this through graph sparsification with auxiliary copy variables and demonstrate two fully on-chip parallel tempering solvers on an FPGA. Targeting MIMO detection, a dense, NP-hard problem central to wireless communications, we first fit 11 temperature replicas of a 128-node sparsified system (1,408 p-bits) on-chip and achieve bit error rates significantly below conventional linear detectors on $64 \times 64$ BPSK MIMO. We report complete end-to-end solution times of 3~ms per instance, including all loading, sampling, readout, and verification overheads. ASIC projections in 7~nm technology indicate 103~MHz operation at 285.8~mW, suggesting that massive parallelism across multiple chips could approach the throughput demands of next-generation wireless systems. Sparsification, however, introduces a sharp sensitivity to the copy-constraint strength $P$ that requires manual tuning. To eliminate this bottleneck, we utilize Two-Dimensional Parallel Tempering (2D-PT), which exchanges replicas across both temperature ($β$) and constraint ($P$) dimensions. On Sherrington--Kirkpatrick spin glasses, 2D-PT converges roughly $250\times$ faster than optimally tuned 1D-PT, and on $128 \times 128$ MIMO it reaches zero bit errors at high SNR where 1D-PT exhibits an error floor. We further validate 2D-PT entirely on-chip with 54 replicas (1,728 p-bits) on a $16 \times 16$ MIMO instance, where it tracks the maximum-likelihood bound in just 50 Monte Carlo steps -- $10\times$ fewer than 1D-PT -- at projected 111~MHz and 124~mW in 7~nm. Together, these results establish an on-chip p-bit architecture and a scalable, tuning-free algorithmic framework for dense combinatorial optimization.

翻译：基于p比特构建的概率计算机为组合优化提供了一条有前景的路径，但实际应用问题所需的高密度连接在硬件实现中扩展性较差。本文通过引入辅助复制变量的图稀疏化方法解决该问题，并在FPGA上展示了两种全片上并行退火求解器。针对无线通信中典型的稠密NP难问题——MIMO检测，我们首先在芯片上部署了128节点稀疏化系统（1408个p比特）的11个温度副本，并在$64 \times 64$ BPSK MIMO系统中取得了显著优于传统线性检测器的误码率。我们报告了完整的端到端求解时间（含加载、采样、读出和验证开销）为每次例化3~毫秒。基于7~nm工艺的ASIC投影表明，在285.8~mW功耗下可实现103~MHz工作频率，这表明多芯片间的海量并行有望接近下一代无线系统的吞吐量需求。然而，稀疏化方法对副本约束强度$P$的取值极为敏感，需要人工调整。为消除该瓶颈，我们采用二维并行退火（2D-PT）算法，该算法在温度（$β$）和约束（$P$）两个维度上同时交换副本。在Sherrington–Kirkpatrick自旋玻璃模型上，2D-PT的收敛速度约为最优调参1D-PT的250倍；在$128 \times 128$ MIMO系统中，1D-PT在高信噪比时存在错误平层，而2D-PT可实现零误码。我们进一步在$16 \times 16$ MIMO实例上实现了54个副本（1728个p比特）的全片上2D-PT验证，仅需50次蒙特卡洛步（比1D-PT少10倍）即可逼近最大似然界，7~nm工艺下投影工作频率为111~MHz、功耗124~mW。这些成果共同为密集组合优化问题构建了片上p比特架构和可扩展、免调参的算法框架。