Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics

We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, simple handcrafted controllers exist that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration and deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF's applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world. BCF is a promising approach towards combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at https://krishanrana.github.io/bcf.

翻译：我们提出贝叶斯控制器融合（Bayesian Controller Fusion, BCF）：一种结合传统手工控制器与无模型深度强化学习（RL）优势的混合控制策略。BCF在机器人领域表现出色——该领域存在可靠但次优的控制先验可应用于诸多任务，而从头开始的强化学习仍存在安全性不足与数据效率低下的问题。通过融合各系统输出的不确定性感知分布，BCF在两种控制方式之间进行仲裁，发挥其各自优势。我们在两项真实机器人任务中研究了BCF：一是在广阔且长期规划环境中的导航任务，二是涉及可操作性最大化的复杂抓取任务。在这两类场景中，虽然存在能谨慎完成任务的手工控制器，但由于解析建模的局限性、控制器失准以及任务变异性，这些控制器未必能实现最优解。随着探索过程在训练初期自然受先验知识引导，BCF在加速学习的同时，随着策略积累更多经验，其性能显著超越控制先验。更重要的是，得益于控制先验的风险规避特性，BCF确保了安全探索与部署——当策略对未知状态缺乏认知时，控制先验自然主导动作分布。我们还展示了BCF在零样本模拟到现实场景中的适用性，及其处理真实世界中分布外状态的能力。BCF为深度强化学习与传统机器人控制的优势互补提供了可行方案，其综合性能超越任一独立方法所能达到的效果。相关代码与补充视频资料已公开于https://krishanrana.github.io/bcf。