Designing high-performance legged robots requires jointly optimizing morphology and control. Model-free Reinforcement Learning (RL) offers an alternative to model-predictive control for developing robust controllers without explicitly specifying robot dynamics. Thus, we have seen theuse of RL to train controllers and evaluate designs for robot morphology optimization. While RL has shown success inlocomotion, using it in the co-design inner loop is expensive due to repeated policy training. Universal policies conditioned on morphology offer a promising alternative, but suffer from behavioral diversity collapse, converging to a single strategy that performs sub-optimally across designs. On the other hand, end-to-end Mixture-of-Experts (MoE) architectures fail due to a collapse in its representation. We propose Gaussian Evolutionary Specialists (GES), a framework that decouples design-space partitioning from policy learning to capture diverse behaviors explicitly. GES assigns specialist policies to evolving Gaussian regions and iteratively refines them via training, probing, and territory expansion. The resulting specialists are integrated into a design sampling loop, replacing costly re-training with direct evaluation. When tested on the Buoyancy-Assisted Light Legged Unit (BALLU), GES discovers designs with 5 - 25% higher performance than naive universal policies. On hardware, a GES optimized design overcomes a 24 cm tall obstacle - 3x improvement over the baseline BALLU design. Moreover, GES curtails design optimization time by 37%.
翻译:设计高性能足式机器人需要同时优化形态与控制。无模型强化学习为开发鲁棒控制器提供了替代模型预测控制的方法,无需显式指定机器人动力学。因此,我们已看到强化学习被用于训练控制器和评估设计的形态优化。尽管强化学习在运动领域取得成效,但将其用于协同设计内循环时,因反复策略训练而成本高昂。基于形态条件的通用策略虽具前景,却存在行为多样性崩溃问题,会收敛至对所有设计表现欠佳的单一策略。另一方面,端到端混合专家架构因表征崩溃而失效。我们提出高斯进化专家——一种将设计空间划分与策略学习解耦以显式捕获多样化行为的框架。该框架将专家策略分配至演化高斯区域,并通过训练、探测和领土扩展迭代优化。最终专家策略被整合至设计采样循环中,用直接评估替代昂贵的重新训练。在浮力辅助轻型足单元平台测试中,高斯进化专家发现的设计性能比朴素通用策略高出5%-25%。在硬件实验中,经高斯进化专家优化的设计成功跨越24厘米障碍物——相比基线BALLU设计提升3倍。此外,高斯进化专家将设计优化时间缩减37%。