Classical pixel-based Visual Servoing (VS) approaches offer high accuracy but suffer from a limited convergence area due to optimization nonlinearity. Modern deep learning-based VS methods overcome traditional vision issues but lack scalability, requiring training on limited scenes. This paper proposes a hybrid VS strategy utilizing Deep Reinforcement Learning (DRL) and optimal control to enhance both convergence area and scalability. The DRL component of our approach separately handles representation and policy learning to enhance scalability, generalizability, learning efficiency and ease domain adaptation. Moreover, the optimal control part ensures high end-point accuracy. Our method showcases remarkable achievements in terms of high convergence rates and minimal end-positioning errors using a 7-DOF manipulator. Importantly, it exhibits scalability across more than 1000 distinct scenes. Furthermore, we demonstrate its capacity for generalization to previously unseen datasets. Lastly, we illustrate the real-world applicability of our approach, highlighting its adaptability through single-shot domain transfer learning in environments with noise and occlusions. Real-robot experiments can be found at \url{https://sites.google.com/view/vsls}.
翻译:经典的基于像素的视觉伺服方法虽精度较高,但因优化非线性问题导致收敛区域有限。现代基于深度学习的视觉伺服方法克服了传统视觉难题,但缺乏可扩展性,需在有限场景下进行训练。本文提出一种融合深度强化学习与最优控制的混合视觉伺服策略,同时提升收敛区域与可扩展性。该方法中的深度强化学习模块分别处理表征学习与策略学习,以增强可扩展性、泛化能力、学习效率并简化领域适应过程;最优控制模块则确保高终点精度。在7自由度机械臂上的实验表明,本方法在实现高收敛率与极小终端定位误差方面表现卓越,更重要的是,其可扩展性覆盖超过1000个不同场景。此外,我们验证了该方法对未见数据集具有泛化能力,并展示了其在噪声与遮挡环境中的单次领域迁移学习适用性。真实机器人实验详见网址\url{https://sites.google.com/view/vsls}。