LiPS：基于并行-串联结构的大规模人形机器人强化学习 (LiPS: Large-Scale Humanoid Robot Reinforcement Learning with Parallel-Series Structures)

In recent years, research on humanoid robots has garnered significant attention, particularly in reinforcement learning based control algorithms, which have achieved major breakthroughs. Compared to traditional model-based control algorithms, reinforcement learning based algorithms demonstrate substantial advantages in handling complex tasks. Leveraging the large-scale parallel computing capabilities of GPUs, contemporary humanoid robots can undergo extensive parallel training in simulated environments. A physical simulation platform capable of large-scale parallel training is crucial for the development of humanoid robots. As one of the most complex robot forms, humanoid robots typically possess intricate mechanical structures, encompassing numerous series and parallel mechanisms. However, many reinforcement learning based humanoid robot control algorithms currently employ open-loop topologies during training, deferring the conversion to series-parallel structures until the sim2real phase. This approach is primarily due to the limitations of physics engines, as current GPU-based physics engines often only support open-loop topologies or have limited capabilities in simulating multi-rigid-body closed-loop topologies. For enabling reinforcement learning-based humanoid robot control algorithms to train in large-scale parallel environments, we propose a novel training method LiPS. By incorporating multi-rigid-body dynamics modeling in the simulation environment, we significantly reduce the sim2real gap and the difficulty of converting to parallel structures during model deployment, thereby robustly supporting large-scale reinforcement learning for humanoid robots.

翻译：近年来，人形机器人研究受到广泛关注，其中基于强化学习的控制算法取得了重大突破。相较于传统基于模型的控制算法，基于强化学习的算法在处理复杂任务方面展现出显著优势。借助GPU的大规模并行计算能力，当代人形机器人能够在仿真环境中进行大规模并行训练。具备大规模并行训练能力的物理仿真平台对人形机器人发展至关重要。作为最复杂的机器人形态之一，人形机器人通常具有精密的机械结构，包含大量串联与并联机构。然而，当前许多基于强化学习的人形机器人控制算法在训练阶段仍采用开环拓扑结构，将串联-并联结构的转换推迟至仿真到现实迁移阶段。这种做法主要受限于物理引擎的约束——当前基于GPU的物理引擎往往仅支持开环拓扑，或在模拟多刚体闭环拓扑方面能力有限。为使基于强化学习的人形机器人控制算法能在大规模并行环境中训练，我们提出了一种新型训练方法LiPS。通过在仿真环境中引入多刚体动力学建模，该方法显著缩小了仿真与现实间的差距，降低了模型部署时向并联结构转换的难度，从而为人形机器人的大规模强化学习提供了稳健支持。