Co-design involves simultaneously optimizing the controller and agents physical design. Its inherent bi-level optimization formulation necessitates an outer loop design optimization driven by an inner loop control optimization. This can be challenging when the design space is large and each design evaluation involves data-intensive reinforcement learning process for control optimization. To improve the sample-efficiency we propose a multi-fidelity-based design exploration strategy based on Hyperband where we tie the controllers learnt across the design spaces through a universal policy learner for warm-starting the subsequent controller learning problems. Further, we recommend a particular way of traversing the Hyperband generated design matrix that ensures that the stochasticity of the Hyperband is reduced the most with the increasing warm starting effect of the universal policy learner as it is strengthened with each new design evaluation. Experiments performed on a wide range of agent design problems demonstrate the superiority of our method compared to the baselines. Additionally, analysis of the optimized designs shows interesting design alterations including design simplifications and non-intuitive alterations that have emerged in the biological world.
翻译:共设计涉及同时优化控制器与智能体的物理设计。其固有的双层优化框架要求外层设计优化由内层控制优化驱动。当设计空间庞大且每次设计评估涉及数据密集型的强化学习控制优化过程时,该任务具有挑战性。为提升样本效率,我们提出基于Hyperband的多保真度设计探索策略——通过通用策略学习器跨设计空间绑定所习得的控制器,以热启动后续的控制器学习问题。此外,我们推荐一种遍历Hyperband生成设计矩阵的特定路径,确保随着通用策略学习器因每次新设计评估而不断增强的热启动效应,其能最大程度降低Hyperband的随机性。在广泛智能体设计问题上的实验表明,本方法相较于基线具有显著优越性。对优化后设计的分析还揭示了有趣的设计变化,包括设计简化及生物界已出现的反直觉性改动。