Modular reconfigurable robots suit task-specific space operations, but the combinatorial growth of morphologies hinders unified control. We propose a decentralized reinforcement learning (Dec-RL) scheme where each module learns its own policy: wheel modules use Soft Actor-Critic (SAC) for locomotion and 7-DoF limbs use Proximal Policy Optimization (PPO) for steering and manipulation, enabling zero-shot generalization to unseen configurations. In simulation, the steering policy achieved a mean absolute error of 3.63{\deg} between desired and induced angles; the manipulation policy plateaued at 84.6 % success on a target-offset criterion; and the wheel policy cut average motor torque by 95.4 % relative to baseline while maintaining 99.6 % success. Lunar-analogue field tests validated zero-shot integration for autonomous locomotion, steering, and preliminary alignment for reconfiguration. The system transitioned smoothly among synchronous, parallel, and sequential modes for Policy Execution, without idle states or control conflicts, indicating a scalable, reusable, and robust approach for modular lunar robots.
翻译:模块化可重构机器人适用于任务特定的空间作业,但形态组合的爆炸式增长阻碍了统一控制。我们提出了一种去中心化强化学习方案,其中每个模块学习其自身策略:轮式模块使用Soft Actor-Critic进行运动控制,七自由度肢体模块使用Proximal Policy Optimization进行转向与操作,从而实现对未见构型的零样本泛化。在仿真中,转向策略在期望角度与诱导角度间的平均绝对误差为3.63°;操作策略在目标偏移准则下的成功率稳定在84.6%;轮式策略在保持99.6%成功率的同时,将平均电机扭矩较基线降低了95.4%。月球模拟实地测试验证了自主运动、转向及重构预备对准的零样本集成能力。该系统在策略执行的同步、并行与串行模式间实现平滑切换,无空闲状态或控制冲突,表明该方法为模块化月球机器人提供了一种可扩展、可复用且鲁棒的控制方案。