实现边缘物理人工智能：系统动力学的硬件加速恢复 (Enabling Physical AI at the Edge: Hardware-Accelerated Recovery of System Dynamics)

Physical AI at the edge -- enabling autonomous systems to understand and predict real-world dynamics in real time -- requires hardware-efficient learning and inference. Model recovery (MR), which identifies governing equations from sensor data, is a key primitive for safe and explainable monitoring in mission-critical autonomous systems operating under strict latency, compute, and power constraints. However, state-of-the-art MR methods (e.g., EMILY and PINN+SR) rely on Neural ODE formulations that require iterative solvers and are difficult to accelerate efficiently on edge hardware. We present \textbf{MERINDA} (Model Recovery in Reconfigurable Dynamic Architecture), an FPGA-accelerated MR framework designed to make physical AI practical on resource-constrained devices. MERINDA replaces expensive Neural ODE components with a hardware-friendly formulation that combines (i) GRU-based discretized dynamics, (ii) dense inverse-ODE layers, (iii) sparsity-driven dropout, and (iv) lightweight ODE solvers. The resulting computation is structured for streaming parallelism, enabling critical kernels to be fully parallelized on the FPGA. Across four benchmark nonlinear dynamical systems, MERINDA delivers substantial gains over GPU implementations: \textbf{114$\times$ lower energy} (434~J vs.\ 49{,}375~J), \textbf{28$\times$ smaller memory footprint} (214~MB vs.\ 6{,}118~MB), and \textbf{1.68$\times$ faster training}, while matching state-of-the-art model-recovery accuracy. These results demonstrate that MERINDA can bring accurate, explainable MR to the edge for real-time monitoring of autonomous systems.

翻译：边缘物理人工智能——使自主系统能够实时理解并预测真实世界动力学——需要硬件高效的学习与推理。模型恢复（MR）通过从传感器数据中识别控制方程，是在严格延迟、计算和功耗约束下运行的关键任务自主系统中实现安全且可解释监控的核心基础。然而，最先进的MR方法（如EMILY和PINN+SR）依赖于需要迭代求解器的神经常微分方程（Neural ODE）公式，难以在边缘硬件上高效加速。本文提出\textbf{MERINDA}（可重构动态架构中的模型恢复），一种专为在资源受限设备上实现实用物理人工智能而设计的FPGA加速MR框架。MERINDA采用硬件友好的公式替代昂贵的Neural ODE组件，该公式融合了：（i）基于GRU的离散化动力学，（ii）稠密逆ODE层，（iii）稀疏性驱动的丢弃机制，以及（iv）轻量级ODE求解器。所得计算结构专为流式并行设计，使关键计算核心能在FPGA上完全并行化。在四个基准非线性动力学系统上的实验表明，MERINDA相较于GPU实现取得显著优势：\textbf{能耗降低114倍}（434~J vs.\ 49{,}375~J），\textbf{内存占用缩小28倍}（214~MB vs.\ 6{,}118~MB），且\textbf{训练速度提升1.68倍}，同时保持与最先进模型恢复方法相当的精度。这些结果表明MERINDA能够为自主系统的实时监控提供精确、可解释的边缘模型恢复能力。