Amidst task-specific learning-based control synthesis frameworks that achieve impressive empirical results, a unified framework that systematically constructs an optimal policy for sufficiently solving a general notion of a task is absent. Hence, we propose a theoretical framework for a task-centered control synthesis leveraging two critical ideas: 1) oracle-guided policy optimization for the non-limiting integration of sub-optimal task-based priors to guide the policy optimization and 2) task-vital multimodality to break down solving a task into executing a sequence of behavioral modes. The proposed approach results in highly agile parkour and diving on a 16-DoF dynamic bipedal robot. The obtained policy advances indefinitely on a track, performing leaps and jumps of varying lengths and heights for the parkour task. Corresponding to the dive task, the policy demonstrates front, back, and side flips from various initial heights. Finally, we introduce a novel latent mode space reachability analysis to study our policies' versatility and generalization by computing a feasible mode set function through which we certify a set of failure-free modes for our policy to perform at any given state.
翻译:在任务特定的学习型控制综合框架取得显著实证成果的同时,缺乏一个能够系统构建最优策略以充分解决通用任务概念的统合框架。为此,我们提出了一种以任务为中心的控制综合理论框架,该框架融合了两个关键思想:1)通过Oracle引导的策略优化,以非限制性方式整合基于任务的次优先验知识来指导策略优化;2)任务关键的多模态性,将任务执行分解为一系列行为模式的序列。所提方法能够在16自由度动态双足机器人上实现高度敏捷的跑酷与跳水动作。获得的策略可在跑道上持续前进,完成不同长度与高度的跳跃动作以应对跑酷任务;针对跳水任务,该策略能从不同初始高度执行前空翻、后空翻及侧空翻。最后,我们提出了一种新颖的潜在模态空间可达性分析方法,通过计算可行模态集函数来研究策略的通用性与泛化能力,从而验证策略在任意状态下执行无故障模态集的能力。