Domain-specialized FPGAs have delivered unprecedented performance for low-latency inference across scientific and industrial workloads, yet nearly all existing accelerators assume static models trained offline, relegating learning and adaptation to slower CPUs or GPUs. This separation fundamentally limits systems that must operate in non-stationary, high-frequency environments, where model updates must occur at the timescale of the underlying physics. In this paper, I argue for a shift from inference-only accelerators to ultrafast on-chip learning, in which both inference and training execute directly within the FPGA fabric under deterministic, sub-microsecond latency constraints. Bringing learning into the same real-time datapath as inference would enable closed-loop systems that adapt as fast as the physical processes they control, with applications spanning quantum error correction, cryogenic qubit calibration, plasma and fusion control, accelerator tuning, and autonomous scientific experiments. Enabling such regimes requires rethinking algorithms, architectures, and toolflows jointly, but promises to transform FPGAs from static inference engines into real-time learning machines.
翻译:领域专用FPGA已在科学和工业负载的低延迟推理方面实现了前所未有的性能,然而几乎所有现有加速器都假设模型为离线训练的静态模型,将学习与适应过程交由速度较慢的CPU或GPU处理。这种分离从根本上限制了必须在非平稳、高频环境中运行的系统——这些系统的模型更新必须与底层物理过程保持相同时间尺度。本文主张从纯推理加速器转向超快速片上学习,使推理和训练都能在确定性亚微秒级延迟约束下直接在FPGA架构中执行。将学习过程整合到与推理相同的实时数据路径中,将实现与受控物理过程同步自适应的闭环系统,其应用范围涵盖量子纠错、低温量子比特校准、等离子体与聚变控制、加速器调谐以及自主科学实验。实现这种范式需要协同重构算法、架构与工具流程,但这有望将FPGA从静态推理引擎转变为实时学习机器。