Robotic policies deployed in real-world environments often encounter post-training faults, where retraining, exploration, or system identification are impractical. We introduce an inference-time, cerebellar-inspired residual control framework that augments a frozen reinforcement learning policy with online corrective actions, enabling fault recovery without modifying base policy parameters. The framework instantiates core cerebellar principles, including high-dimensional pattern separation via fixed feature expansion, parallel microzone-style residual pathways, and local error-driven plasticity with excitatory and inhibitory eligibility traces operating at distinct time scales. These mechanisms enable fast, localized correction under post-training disturbances while avoiding destabilizing global policy updates. A conservative, performance-driven meta-adaptation regulates residual authority and plasticity, preserving nominal behavior and suppressing unnecessary intervention. Experiments on MuJoCo benchmarks under actuator, dynamic, and environmental perturbations show improvements of up to $+66\%$ on \texttt{HalfCheetah-v5} and $+53\%$ on \texttt{Humanoid-v5} under moderate faults, with graceful degradation under severe shifts and complementary robustness from consolidating persistent residual corrections into policy parameters.
翻译:部署在真实环境中的机器人策略常会遇到训练后故障,此时重新训练、探索或系统辨识均不切实际。我们提出一种推理时的小脑启发残差控制框架,通过在线校正动作增强冻结的强化学习策略,无需修改基础策略参数即可实现故障恢复。该框架实例化了小脑的核心原理,包括通过固定特征扩展实现的高维模式分离、并行微区式残差通路,以及具有不同时间尺度兴奋性与抑制性资格迹的局部误差驱动可塑性。这些机制能够在训练后扰动下实现快速局部校正,同时避免破坏全局策略稳定性的更新。一种保守的、性能驱动的元适应机制调节残差权限与可塑性,保持标称行为并抑制不必要的干预。在MuJoCo基准测试中,针对执行器、动力学及环境扰动的实验表明:在中等故障下,\texttt{HalfCheetah-v5}性能提升高达$+66\%$,\texttt{Humanoid-v5}提升$+53\%$;在严重偏移下呈现优雅的性能衰减;通过将持久残差校正巩固至策略参数可获得互补的鲁棒性。