We introduce MORPH, a method for co-optimization of hardware design parameters and control policies in simulation using reinforcement learning. Like most co-optimization methods, MORPH relies on a model of the hardware being optimized, usually simulated based on the laws of physics. However, such a model is often difficult to integrate into an effective optimization routine. To address this, we introduce a proxy hardware model, which is always differentiable and enables efficient co-optimization alongside a long-horizon control policy using RL. MORPH is designed to ensure that the optimized hardware proxy remains as close as possible to its realistic counterpart, while still enabling task completion. We demonstrate our approach on simulated 2D reaching and 3D multi-fingered manipulation tasks.
翻译:我们提出MORPH,一种利用强化学习在仿真中协同优化硬件设计参数与控制策略的方法。与大多数协同优化方法类似,MORPH依赖于被优化硬件的模型(通常基于物理定律进行仿真)。然而,此类模型往往难以有效整合到优化流程中。为解决这一问题,我们引入一个始终可微的硬件代理模型,使其能够与长视界控制策略通过强化学习实现高效协同优化。MORPH的设计旨在确保优化后的硬件代理在尽可能接近真实硬件的同时,仍能完成任务。我们在仿真的二维到达任务和三维多指操作任务上验证了该方法。