Meta-reinforcement learning enables fast adaptation by extracting shared structure from related tasks, but existing end-to-end methods often couple task inference with embodiment-specific control. This coupling can obscure non-parametric task semantics, reduce sample efficiency, and limit cross-agent reuse. We propose a meta-knowledge reutilization framework that learns task-level knowledge on a dynamics-simplified agent and transfers it to heterogeneous agents. The framework uses a Bayesian non-parametric prior to organize latent task modes and a high-level policy to generate task-level magnitude guidance. To bridge reusable task knowledge with different embodiments, we introduce a semantic-magnitude interface and a lightweight temporal adaptor, which convert frozen meta-knowledge into temporally aligned subgoals for embodiment-specific low-level controllers. Experiments on multiple locomotion agents show that our framework reduces final-step tracking error by 94.75% -- 99.79% compared with recent state-of-the-art baselines and achieves comparable deployment performance with about 23.8% of their interaction data.
翻译:元强化学习通过从相关任务中提取共享结构实现快速适应,但现有端到端方法常将任务推理与具身特定控制耦合。这种耦合可能掩盖非参数化任务语义、降低样本效率并限制跨智能体重用。我们提出一种元知识重用框架,该框架在动力学简化智能体上学习任务级知识,并将其迁移至异构智能体。该框架采用贝叶斯非参数先验组织潜在任务模式,并通过高层策略生成任务级幅度引导。为桥接可重用任务知识与不同具身形态,我们引入语义-幅度接口与轻量级时序适配器,将冻结的元知识转换为具身特定低级控制器所需的时序对齐子目标。在多运动智能体上的实验表明,相较于近期最先进基线方法,本框架将最终步长跟踪误差降低94.75%-99.79%,且仅需约23.8%的交互数据即可达到相当的部署性能。