There are several challenges in developing a model for multi-tasking humanoid control. Reinforcement learning and imitation learning approaches are quite popular in this domain. However, there is a trade-off between the two. Reinforcement learning is not the best option for training a humanoid to perform multiple behaviors due to training time and model size, and imitation learning using kinematics data alone is not appropriate to realize the actual physics of the motion. Training models to perform multiple complex tasks take long training time due to high DoF and complexities of the movements. Although training models offline would be beneficial, another issue is the size of the dataset, usually being quite large to encapsulate multiple movements. There are few implementations of transformer-based models to control humanoid characters and predict their motion based on a large dataset of recorded/reference motion. In this paper, we train a GPT on a large dataset of noisy expert policy rollout observations from a humanoid motion dataset as a pre-trained model and fine tune that model on a smaller dataset of noisy expert policy rollout observations and actions to autoregressively generate physically plausible motion trajectories. We show that it is possible to train a GPT-based foundation model on a smaller dataset in shorter training time to control a humanoid in a realistic physics environment to perform human-like movements.
翻译:开发多任务人形机器人控制模型面临若干挑战。强化学习与模仿学习是该领域的主流方法,但二者存在固有权衡:由于训练时间与模型规模限制,强化学习不适用于训练人形机器人执行多类行为;而仅使用运动学数据的模仿学习难以实现动作的真实物理特性。由于高自由度与运动复杂性,训练模型执行多重复杂任务需要漫长训练周期。虽然离线训练模型具有优势,但数据集规模通常需足够庞大以涵盖多种运动模式,这构成另一挑战。目前基于Transformer架构的模型在人形角色控制及基于大规模记录/参考运动数据预测动作方面的应用尚属罕见。本文提出一种方法:首先在包含噪声专家策略推演观测数据的大规模人形运动数据集上预训练GPT模型,随后在包含噪声专家策略推演观测与动作的小规模数据集上对该模型进行微调,使其能够自回归生成物理合理的运动轨迹。实验表明,基于较小规模数据集、较短训练周期训练GPT基础模型,即可在真实物理环境中控制人形机器人执行类人运动。