Quantum-classical interfaces (QCIs) for fault-tolerant quantum computing must manage simultaneous, real-time decoding across thousands to millions of logical qubits. Scaling these architectures necessitates sharing expensive decoding resources among logical qubits, which introduces severe resource contention within the QCI. While resolving these bottlenecks through efficient resource distribution remains a persistent challenge, lightweight predecoding holds promise to alleviate strain on shared decoding components by decreasing average latency and decoder usage. Notably, research into both decoder allocation and predecoding has been strictly confined to the surface code. With the growing emphasis on general quantum low-density parity-check (qLDPC) codes, slower decoding speeds will intensify resource contention, while the inherent complexity of these codes will render manual predecoder design unfeasible. To address this gap, we introduce an automated framework designed to generate predecoders for arbitrary qLDPC codes. These automatically constructed predecoders autonomously process over 90% of the decoding workload, cutting overall decoder utilization by up to 3,963x. This includes a reduction of up to 72.71% in computationally demanding ordered statistics decoding (OSD). Furthermore, we detail a highly efficient, pipelined hardware design that allows for the concurrent decoding of approximately 1,200 bivariate bicycle (BB) code logical qubits using a single FPGA. When implemented as a cryogenic ASIC, the architecture scales to support between 36,000 and 360,000 BB code logical qubits, operating within a 1.5 W power limit at 4 K.
翻译:用于容错量子计算的量子-经典接口(QCIs)需在数千至数百万逻辑量子比特上同时进行实时解码。扩展这类架构需要逻辑量子比特共享昂贵的解码资源,这会在QCI内引发严重的资源竞争。尽管通过高效资源分配解决这些瓶颈仍是持续挑战,轻量级预解码有望通过降低平均延迟和减少解码器使用来缓解共享解码组件的压力。值得注意的是,关于解码器分配与预解码的研究严格局限于表面码。随着通用量子低密度奇偶校验(qLDPC)码日益受重视,更慢的解码速度将加剧资源竞争,而这类码的固有复杂性将使人工预解码器设计变得不可行。为填补这一空白,我们提出了一种可自动生成任意qLDPC码预解码器的框架。这些自动构建的预解码器能自主处理超过90%的解码工作负载,将解码器整体利用率降低高达3963倍,其中包括将计算密集的排序统计解码(OSD)减少72.71%。此外,我们设计了一种高效流水线硬件架构,可在单块FPGA上实现约1200个双变量自行车(BB)码逻辑量子比特的并发解码。当作为低温ASIC实现时,该架构可在4K温度下、1.5瓦功率限制内支持36000至360000个BB码逻辑量子比特。