Uncertainty of environments has long been a difficult characteristic to handle, when performing real-world robot tasks. This is because the uncertainty produces unexpected observations that cannot be covered by manual scripting. Learning based robot controlling methods are a promising approach for generating flexible motions against unknown situations, but still tend to suffer under uncertainty due to its deterministic nature. In order to adaptively perform the target task under such conditions, the robot control model must be able to accurately understand the possible uncertainty, and to exploratively derive the optimal action that minimizes such uncertainty. This paper extended an existing predictive learning based robot control method, which employ foresight prediction using dynamic internal simulation. The foresight module refines the model's hidden states by sampling multiple possible futures and replace with the one that led to the lower future uncertainty. The adaptiveness of the model was evaluated on a door opening task. The door can be opened either by pushing, pulling, or sliding, but robot cannot visually distinguish which way, and is required to adapt on the fly. The results showed that the proposed model adaptively diverged its motion through interaction with the door, whereas conventional methods failed to stably diverge. The models were analyzed on Lyapunov exponents of RNN hidden states which reflect the possible divergence at each time step during task execution. The result indicated that the foresight module biased the model to consider future consequences, which lead to embedding uncertainties at the policy of the robot controller, rather than the resultant observation. This is beneficial for implementing adaptive behaviors, which indices derivation of diverse motion during exploration.
翻译:环境的不确定性长期以来是执行现实机器人任务时难以处理的特征,这是因为不确定性会产生无法通过手动编程覆盖的意外观测。基于学习的机器人控制方法对于生成应对未知情境的灵活运动具有前景,但由于其确定性本质,在不确定性条件下仍易受影响。为了在此类条件下自适应地执行目标任务,机器人控制模型必须能够准确理解潜在的不确定性,并通过探索性推导最小化此类不确定性的最优动作。本文扩展了现有基于预测学习的机器人控制方法,该方法采用动态内部仿真进行前瞻预测。前瞻模块通过采样多种可能的未来状态来优化模型的隐藏状态,并替换为能降低未来不确定性的状态。模型的自适应性在开门任务中进行了评估。门可通过推、拉或滑动方式开启,但机器人无法通过视觉区分具体方式,需要在执行过程中实时适应。结果表明,所提模型通过与门的交互自适应地分化其运动轨迹,而传统方法未能实现稳定的运动分化。通过分析反映任务执行过程中各时间步潜在分化情况的RNN隐藏状态李雅普诺夫指数,发现前瞻模块使模型倾向于考虑未来后果,从而将不确定性嵌入机器人控制器的策略层面而非结果观测层面。这有利于实现自适应行为,在探索过程中引导多样化运动的推导。