Reinforcement learning has shown strong performance in robotic manipulation, but learned policies often degrade in performance when test conditions differ from the training distribution. This limitation is especially important in contact-rich tasks such as pushing and pick-and-place, where changes in goals, contact conditions, or robot dynamics can drive the system out-of-distribution at inference time. In this paper, we investigate a hybrid controller that combines reinforcement learning with bounded extremum seeking to improve robustness under such conditions. In the proposed approach, deep deterministic policy gradient (DDPG) policies are trained under standard conditions on the robotic pushing and pick-and-place tasks, and are then combined with bounded ES during deployment. The RL policy provides fast manipulation behavior, while bounded ES ensures robustness of the overall controller to time variations when operating conditions depart from those seen during training. The resulting controller is evaluated under several out-of-distribution settings, including time-varying goals and spatially varying friction patches.
翻译:强化学习在机器人操作中展现出强劲性能,但当测试条件与训练分布不同时,所学策略的性能往往会下降。这一局限性在推取和抓放等接触密集任务中尤为突出,因为在推理时,目标、接触条件或机器人动力学变化可能导致系统偏离原始分布。本文研究了一种结合强化学习与有界极值搜索的混合控制器,旨在提升此类条件下的鲁棒性。在提出的方法中,深度确定性策略梯度(DDPG)策略在标准条件下针对机器人推取和抓放任务进行训练,并在部署时与有界极值搜索相结合。强化学习策略提供快速操作行为,而有界极值搜索则确保当运行条件偏离训练环境时,整体控制器对时间变化的鲁棒性。最终控制器在多种分布偏移场景下进行评估,包括时变目标和空间变化摩擦斑块。