Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.
翻译:学习跨不同机器人形态的通用策略可显著提升学习效率,并实现对未见形态的零样本泛化。然而,学习高性能的通用策略需要如Transformer(TF)这类复杂架构,其内存与计算成本均高于简单的多层感知机(MLP)。为在推理时同时实现TF般的优异性能与MLP般的高效性,我们提出HyperDistill,其包含:(1)一个生成机器人专属MLP策略的形态条件超网络(HN),以及(2)对成功训练至关重要的策略蒸馏方法。我们在包含数百种多样形态的基准测试集UNIMAL上证明,HyperDistill在训练及未见测试机器人上的表现与通用TF教师策略相当,但模型规模缩小了6至14倍,计算成本在不同环境中降低了67至160倍。我们的分析将HyperDistill在推理时的效率优势归因于知识解耦,即分离任务间知识与任务内知识的能力,这一通用原则亦可应用于提升其他领域的推理效率。