State-of-the-art sensorimotor learning algorithms offer policies that can often produce unstable behaviors, damaging the robot and/or the environment. Traditional robot learning, on the contrary, relies on dynamical system-based policies that can be analyzed for stability/safety. Such policies, however, are neither flexible nor generic and usually work only with proprioceptive sensor states. In this work, we bridge the gap between generic neural network policies and dynamical system-based policies, and we introduce Autonomous Neural Dynamic Policies (ANDPs) that: (a) are based on autonomous dynamical systems, (b) always produce asymptotically stable behaviors, and (c) are more flexible than traditional stable dynamical system-based policies. ANDPs are fully differentiable, flexible generic-policies that can be used in imitation learning setups while ensuring asymptotic stability. In this paper, we explore the flexibility and capacity of ANDPs in several imitation learning tasks including experiments with image observations. The results show that ANDPs combine the benefits of both neural network-based and dynamical system-based methods.
翻译:最先进的感知运动学习算法提供的策略常产生不稳定行为,可能损坏机器人和/或环境。传统机器人学习则依赖于可进行稳定性/安全性分析的动态系统策略,然而这类策略既不灵活也不通用,通常仅适用于本体感知传感器状态。本研究弥合了通用神经网络策略与动态系统策略之间的鸿沟,提出自主神经动态策略(Autonomous Neural Dynamic Policies, ANDPs),其特性包括:(a) 基于自主动态系统,(b) 始终产生渐近稳定行为,(c) 比传统稳定动态系统策略更具灵活性。ANDPs作为完全可微的灵活通用策略,可在保证渐近稳定性的同时应用于模仿学习框架。本文通过包含图像观测实验的多项模仿学习任务,探究了ANDPs的灵活性与容量。结果表明,ANDPs兼具神经网络方法与动态系统方法的优势。