Robotics policies are always subjected to complex, second order dynamics that entangle their actions with resulting states. In reinforcement learning (RL) contexts, policies have the burden of deciphering these complicated interactions over massive amounts of experience and complex reward functions to learn how to accomplish tasks. Moreover, policies typically issue actions directly to controllers like Operational Space Control (OSC) or joint PD control, which induces straightline motion towards these action targets in task or joint space. However, straightline motion in these spaces for the most part do not capture the rich, nonlinear behavior our robots need to exhibit, shifting the burden of discovering these behaviors more completely to the agent. Unlike these simpler controllers, geometric fabrics capture a much richer and desirable set of behaviors via artificial, second order dynamics grounded in nonlinear geometry. These artificial dynamics shift the uncontrolled dynamics of a robot via an appropriate control law to form behavioral dynamics. Behavioral dynamics unlock a new action space and safe, guiding behavior over which RL policies are trained. Behavioral dynamics enable bang-bang-like RL policy actions that are still safe for real robots, simplify reward engineering, and help sequence real-world, high-performance policies. We describe the framework more generally and create a specific instantiation for the problem of dexterous, in-hand reorientation of a cube by a highly actuated robot hand.
翻译:机器人策略始终受制于复杂的二阶动力学,其动作与最终状态之间存在纠缠关系。在强化学习场景中,策略需要从海量经验数据和复杂奖励函数中解析这些复杂交互,从而学习如何完成任务。此外,策略通常直接向操作空间控制(OSC)或关节PD控制等控制器下达动作指令,这会引发任务空间或关节空间中朝向这些动作目标的直线运动。然而,这些空间中的直线运动大多无法捕捉机器人所需展现的丰富非线性行为,迫使智能体更彻底地承担起探索这些行为的责任。与这些简单控制器不同,几何织物通过基于非线性几何构建的人为二阶动力学,捕捉了更丰富且更理想的行为集合。这些人为动力学通过适当的控制律改变机器人的非受控动力学,形成行为动力学。行为动力学为强化学习策略的训练开辟了新的动作空间和安全的引导行为,使强化学习策略能执行对真实机器人仍具安全性的bang-bang类动作,简化奖励函数设计,并助力实际高性能策略的序列化部署。本文对该框架进行了通用性描述,并针对高驱动机械手执行灵巧立方体抓取重定向问题,创建了具体的实例化方案。