Most dynamics functions are not well-aligned to task requirements. Controllers, therefore, often invert the dynamics and reshape it into something more useful. The learning community has found that these controllers, such as Operational Space Control (OSC), can offer important inductive biases for training. However, OSC only captures straight line end-effector motion. There's a lot more behavior we could and should be packing into these systems. Earlier work [15][16][19] developed a theory that generalized these ideas and constructed a broad and flexible class of second-order dynamical systems which was simultaneously expressive enough to capture substantial behavior (such as that listed above), and maintained the types of stability properties that make OSC and controllers like it a good foundation for policy design and learning. This paper, motivated by the empirical success of the types of fabrics used in [20], reformulates the theory of fabrics into a form that's more general and easier to apply to policy learning problems. We focus on the stability properties that make fabrics a good foundation for policy synthesis. Fabrics create a fundamentally stable medium within which a policy can operate; they influence the system's behavior without preventing it from achieving tasks within its constraints. When a fabrics is geometric (path consistent) we can interpret the fabric as forming a road network of paths that the system wants to follow at constant speed absent a forcing policy, giving geometric intuition to its role as a prior. The policy operating over the geometric fabric acts to modulate speed and steers the system from one road to the next as it accomplishes its task. We reformulate the theory of fabrics here rigorously and develop theoretical results characterizing system behavior and illuminating how to design these systems, while also emphasizing intuition throughout.
翻译:[摘要] 大多数动力学函数与任务需求并不完全匹配。因此,控制器通常会对动力学进行逆变换并将其重塑为更有用的形式。学术界发现,诸如操作空间控制(OSC)这类控制器能为训练提供重要的归纳偏置。然而,OSC仅能捕捉末端执行器的直线运动,而我们需要且应该将更多行为封装到这些系统中。早期研究[15][16][19]发展了一套理论,将这些思想进行泛化,构建了一类广泛而灵活的二阶动力系统。这类系统既能充分表达复杂行为(如上文所列),又保留了使OSC及其同类控制器成为策略设计与学习最佳基础的那类稳定性特性。受文献[20]中织物类型实证成功的启发,本文将该系统理论重新表述为更通用、更易应用于策略学习问题的形式。我们聚焦于使织物成为策略综合优良基础的稳定性特性:织物为策略的运行创造了根本稳定的介质,在不妨碍系统在约束条件下完成任务的前提下影响其行为。当织物具有几何一致性(路径一致性)时,可将其解释为系统在无强制策略时希望以恒定速度遵循的路径网络,从而为其作为先验的角色提供几何直观理解。运行于几何织物上的策略通过调节速度,在完成任务的过程中引导系统在路径网络间切换。本文严格重构了织物理论体系,在始终强调直观理解的同时,建立了表征系统行为并阐明系统设计方法的理论成果。