We consider robot learning in the context of shared autonomy, where control of the system can switch between a human teleoperator and autonomous control. In this setting we address reinforcement learning, and learning from demonstration, where there is a cost associated with human time. This cost represents the human time required to teleoperate the robot, or recover the robot from failures. For each episode, the agent must choose between requesting human teleoperation, or using one of its autonomous controllers. In our approach, we learn to predict the success probability for each controller, given the initial state of an episode. This is used in a contextual multi-armed bandit algorithm to choose the controller for the episode. A controller is learnt online from demonstrations and reinforcement learning so that autonomous performance improves, and the system becomes less reliant on the teleoperator with more experience. We show that our approach to controller selection reduces the human cost to perform two simulated tasks and a single real-world task.
翻译:我们考虑共享自主性背景下的机器人学习,其中系统控制可在人类远程操作与自主控制之间切换。在此设定中,我们研究了强化学习与示范学习,同时考虑了与人类时间相关的成本。该成本代表人类远程操作机器人或从失败中恢复所需的时间。对于每个回合,智能体必须在请求人类远程操作或使用其自主控制器之间进行选择。我们的方法中,根据回合初始状态学习预测每个控制器的成功概率,并利用该信息通过上下文多臂老虎机算法选择该回合所使用的控制器。控制器通过示范和强化学习在线训练,从而提升自主性能,使系统随着经验积累逐渐降低对远程操作员的依赖。实验表明,我们提出的控制器选择方法能够降低两项模拟任务和一项真实世界任务中的人类成本。