In recent years, learning-based control in robotics has gained significant attention due to its capability to address complex tasks in real-world environments. With the advances in machine learning algorithms and computational capabilities, this approach is becoming increasingly important for solving challenging control problems in robotics by learning unknown or partially known robot dynamics. Active exploration, in which a robot directs itself to states that yield the highest information gain, is essential for efficient data collection and minimizing human supervision. Similarly, uncertainty-aware deployment has been a growing concern in robotic control, as uncertain actions informed by the learned model can lead to unstable motions or failure. However, active exploration and uncertainty-aware deployment have been studied independently, and there is limited literature that seamlessly integrates them. This paper presents a unified model-based reinforcement learning framework that bridges these two tasks in the robotics control domain. Our framework uses a probabilistic ensemble neural network for dynamics learning, allowing the quantification of epistemic uncertainty via Jensen-Renyi Divergence. The two opposing tasks of exploration and deployment are optimized through state-of-the-art sampling-based MPC, resulting in efficient collection of training data and successful avoidance of uncertain state-action spaces. We conduct experiments on both autonomous vehicles and wheeled robots, showing promising results for both exploration and deployment.
翻译:近年来,基于学习的机器人控制因其在真实环境中解决复杂任务的能力而受到广泛关注。随着机器学习算法与计算能力的进步,该方法通过学习未知或部分已知的机器人动力学,在解决控制难题方面日益重要。主动探索(即机器人自主导向至信息增益最大的状态)对于高效数据收集及最小化人工监督至关重要。同时,基于学习模型的不确定性感知部署(即考虑模型预测不确定性以避免不稳定运动或失败的决策)已成为机器人控制领域的重要课题。然而,主动探索与不确定性感知部署长期以来被独立研究,少有文献能无缝整合两者。本文提出了一个统一的基于模型的强化学习框架,在机器人控制领域弥合这两个任务。该框架采用概率集成神经网络进行动力学学习,通过Jensen-Renyi散度量化认知不确定性。探索与部署这两个冲突的目标经由最先进的基于采样的模型预测控制(MPC)进行优化,从而实现训练数据的高效收集与不确定状态-动作空间的成功规避。我们在自动驾驶车辆与轮式机器人上进行了实验,结果显示该方法在探索与部署任务中均表现优异。