Demonstration-Guided Reinforcement Learning with Efficient Exploration for Task Automation of Surgical Robot

Task automation of surgical robot has the potentials to improve surgical efficiency. Recent reinforcement learning (RL) based approaches provide scalable solutions to surgical automation, but typically require extensive data collection to solve a task if no prior knowledge is given. This issue is known as the exploration challenge, which can be alleviated by providing expert demonstrations to an RL agent. Yet, how to make effective use of demonstration data to improve exploration efficiency still remains an open challenge. In this work, we introduce Demonstration-guided EXploration (DEX), an efficient reinforcement learning algorithm that aims to overcome the exploration problem with expert demonstrations for surgical automation. To effectively exploit demonstrations, our method estimates expert-like behaviors with higher values to facilitate productive interactions, and adopts non-parametric regression to enable such guidance at states unobserved in demonstration data. Extensive experiments on $10$ surgical manipulation tasks from SurRoL, a comprehensive surgical simulation platform, demonstrate significant improvements in the exploration efficiency and task success rates of our method. Moreover, we also deploy the learned policies to the da Vinci Research Kit (dVRK) platform to show the effectiveness on the real robot. Code is available at https://github.com/med-air/DEX.

翻译：外科手术机器人任务自动化具有提升手术效率的潜力。近期基于强化学习的方法为手术自动化提供了可扩展的解决方案，但在缺乏先验知识的情况下，通常需要大量数据采集才能完成既定任务。这一问题被称为探索挑战，通过向强化学习智能体提供专家演示可缓解该挑战。然而，如何有效利用演示数据提升探索效率仍是一个待解决的难题。本研究提出演示引导探索算法（DEX），这是一种旨在通过专家演示克服手术自动化探索难题的高效强化学习算法。为有效利用演示数据，本方法通过评估具有更高价值的专家级行为来促进有效交互，并采用非参数回归方法在演示数据未覆盖的状态下实现此类引导。基于综合手术仿真平台SurRoL中的10项手术操作任务的广泛实验表明，本方法在探索效率与任务成功率方面均取得显著提升。此外，我们还将所学策略部署到达芬奇研究套件（dVRK）平台，验证了其在真实机器人上的有效性。代码开源在https://github.com/med-air/DEX。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。