Humanoid robots performing in-field manipulation tasks, such as robotic apple harvesting, face severe energy constraints that directly limit the number of reaching motions that can be executed per battery charge. This paper presents an end-to-end, energy-aware reinforcement learning framework for the 7-degree-of-freedom left arm of the Unitree~G1 humanoid robot, combining a physics-based, experimentally identified electrical power model with a Soft Actor-Critic (SAC) policy trained in a Pinocchio-based rigid-body dynamics simulator. The RL policy operates on an incremental joint-position action space and is trained with a Hybrid Constellation Reward that combines a four-point end-effector constellation distance with a torque-norm energy proxy; after % $5\times10^6$ training it reaches a $69.9\%$ success rate over $1\,000$ random targets in kinematic simulation, at a mean energy of \SI{98.16}{\joule} on successful episodes. Finally, on the physical Unitree~G1, the policy is validated over three independent 10-target batches, achieving a mean energy of $71.5 \pm 48.3$\,J, an end-effector position error of $2.64 \pm 1.04$\,cm, and an orientation error of $6.92 \pm 1.33^\circ$ -- within the \SI{4}{\centi\metre}/$8.6^\circ$ training tolerance. These results constitute a first step toward energy-aware reinforcement-learning-based arm reaching for humanoid robots.
翻译:仿人机器人在野外执行作业任务(如机器人采摘苹果)时面临严峻的能源约束,直接限制了单次电池充电可完成的伸展运动次数。本文提出一种端到端的能量感知强化学习框架,应用于Unitree G1仿人机器人的七自由度左臂,该方法将基于物理实验辨识的电力模型与基于Pinocchio刚体动力学模拟器训练的Soft Actor-Critic(SAC)策略相结合。该强化学习策略运行于增量式关节位置动作空间,并采用混合星座奖励函数进行训练,该奖励函数结合了四指末端执行器星座距离与扭矩范数能量代理项;经过500万步训练后,该策略在运动学模拟中对1000个随机目标达到69.9%的成功率,成功回合的平均能量消耗为98.16焦耳。最终,在物理Unitree G1机器人上,该策略通过三组独立10目标批次验证,平均能量消耗为71.5±48.3焦耳,末端执行器位置误差为2.64±1.04厘米,方位误差为6.92±1.33度——均处于训练容差4厘米/8.6度范围内。这些结果为仿人机器人基于能量感知强化学习的手臂伸展研究迈出了第一步。