Models with fewer parameters are necessary for the neural control of memory-limited, performant robots. Finding these smaller neural network architectures can be time-consuming. We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously. Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies. We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints. We show that our method scales well - more training resources produce faster convergence to higher-performing architectures. We demonstrate that the neural policies estimated by HyperPPO are capable of decentralized control of a Crazyflie2.1 quadrotor. Website: https://sites.google.com/usc.edu/hyperppo
翻译:参数更少的模型对于内存受限且性能优异的机器人神经控制至关重要,而寻找这些更小的神经网络架构往往耗时巨大。本文提出HyperPPO——一种利用图超网络同时估计多种神经架构权重的在策略强化学习算法。该方法可估算出比常用网络小得多但仍能编码高性能策略的权重。我们在保持样本效率的同时同步获得多个训练好的策略,并允许用户根据计算约束条件选择网络架构。实验表明该方法具有良好的可扩展性——更多训练资源能加速收敛至更高性能架构。我们验证了HyperPPO估计的神经策略具备对Crazyflie2.1四旋翼飞行器进行分布式控制的能力。项目网站:https://sites.google.com/usc.edu/hyperppo