Self-assessment rules play an essential role in safe and effective real-world robotic applications, which verify the feasibility of the selected action before actual execution. But how to utilize the self-assessment results to re-choose actions remains a challenge. Previous methods eliminate the selected action evaluated as failed by the self-assessment rules, and re-choose one with the next-highest affordance~(i.e. process-of-elimination strategy [1]), which ignores the dependency between the self-assessment results and the remaining untried actions. However, this dependency is important since the previous failures might help trim the remaining over-estimated actions. In this paper, we set to investigate this dependency by learning a failure-aware policy. We propose two architectures for the failure-aware policy by representing the self-assessment results of previous failures as the variable state, and leveraging recurrent neural networks to implicitly memorize the previous failures. Experiments conducted on three tasks demonstrate that our method can achieve better performances with higher task success rates by less trials. Moreover, when the actions are correlated, learning a failure-aware policy can achieve better performance than the process-of-elimination strategy.
翻译:自评估规则在安全且有效的现实世界机器人应用中扮演着关键角色,它能够在实际执行前验证所选动作的可行性。然而,如何利用自评估结果重新选择动作仍是一个挑战。以往的方法会剔除被自评估规则判定为失败的动作,并重新选择具有次高“可供性”(即排除策略[1])的动作,这忽略了自评估结果与剩余未尝试动作之间的依赖性。然而,这种依赖性至关重要,因为先前的失败可能有助于修正剩余被高估的动作。在本文中,我们通过学习一种失败感知策略来探究这种依赖性。我们提出了两种失败感知策略的架构,将先前失败的自评估结果表示为可变状态,并利用循环神经网络隐式记忆先前的失败。在三个任务上进行的实验表明,我们的方法能够通过更少的试验次数实现更高的任务成功率,从而获得更好的性能。此外,当动作之间存在相关性时,学习失败感知策略相比排除策略能取得更优的性能。