We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift adaptation to new dynamics with minimal interaction data. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics, where data is costly to obtain. To address this, PACOH-RL incorporates regularization and epistemic uncertainty quantification in both the meta-learning and task adaptation stages. When facing new dynamics, we use these uncertainty estimates to effectively guide exploration and data collection. Overall, this enables positive transfer, even when access to data from prior tasks or dynamic settings is severely limited. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions. Finally, on a real robotic car, we showcase the potential for efficient RL policy adaptation in diverse, data-scarce conditions.
翻译:我们提出PACOH-RL,一种新型的基于模型的元强化学习算法,旨在高效适应动态变化的控制策略。PACOH-RL通过元学习动力学模型的先验知识,使得仅需极少的交互数据即可快速适应新动态。现有元强化学习方法需要大量元学习数据,这在机器人等数据获取成本高昂的场景中限制了其应用。为解决此问题,PACOH-RL在元学习和任务适应阶段均引入了正则化与认知不确定性量化。面对新动态时,利用这些不确定性估计值有效指导探索与数据采集。这一机制即使在先前任务或动态场景数据极为有限的情况下,仍能实现正向迁移。实验结果表明,PACOH-RL在适应新动态条件时优于基于模型的强化学习和基于模型的元强化学习基线方法。最后,在真实机器人车上,我们展示了该算法在数据稀缺的多样化场景下实现高效策略适应的潜力。