This paper presents a novel learning-based control framework that uses keyframing to incorporate high-level objectives in natural locomotion for legged robots. These high-level objectives are specified as a variable number of partial or complete pose targets that are spaced arbitrarily in time. Our proposed framework utilizes a multi-critic reinforcement learning algorithm to effectively handle the mixture of dense and sparse rewards. Additionally, it employs a transformer-based encoder to accommodate a variable number of input targets, each associated with specific time-to-arrivals. Throughout simulation and hardware experiments, we demonstrate that our framework can effectively satisfy the target keyframe sequence at the required times. In the experiments, the multi-critic method significantly reduces the effort of hyperparameter tuning compared to the standard single-critic alternative. Moreover, the proposed transformer-based architecture enables robots to anticipate future goals, which results in quantitative improvements in their ability to reach their targets.
翻译:本文提出了一种新颖的基于学习的控制框架,该框架利用关键帧技术将高层目标融入腿式机器人的自然运动控制中。这些高层目标被定义为在时间轴上任意间隔分布的、数量可变的局部或完整姿态目标。我们提出的框架采用多评价器强化学习算法,以有效处理稠密奖励与稀疏奖励的混合问题。此外,该框架使用基于Transformer的编码器来适应数量可变的输入目标,其中每个目标均关联特定的到达时间。通过仿真与硬件实验,我们证明了该框架能够在指定时间有效满足目标关键帧序列的要求。实验结果表明,与标准的单评价器方法相比,多评价器方法显著降低了超参数调整的工作量。此外,所提出的基于Transformer的架构使机器人能够预判未来目标,从而在实现目标的能力上取得了定量化的性能提升。