We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift adaptation to new dynamics with minimal interaction data. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics, where data is costly to obtain. To address this, PACOH-RL incorporates regularization and epistemic uncertainty quantification in both the meta-learning and task adaptation stages. When facing new dynamics, we use these uncertainty estimates to effectively guide exploration and data collection. Overall, this enables positive transfer, even when access to data from prior tasks or dynamic settings is severely limited. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions. Finally, on a real robotic car, we showcase the potential for efficient RL policy adaptation in diverse, data-scarce conditions.
翻译:我们提出了PACOH-RL,一种新型的基于模型的元强化学习算法,旨在高效地适应动态变化的环境控制策略。PACOH-RL对动力学模型进行元学习先验,使得仅需少量交互数据即可快速适应新的动态环境。现有元强化学习方法需要大量元学习数据,这限制了其在机器人等数据获取成本高昂场景中的应用。为解决此问题,PACOH-RL在元学习和任务适应阶段均引入了正则化与认知不确定性量化。面对新的动力学环境时,我们利用这些不确定性估计有效指导探索与数据收集。总体上,即便在先验任务或动态场景数据极其有限的条件下,该方法仍能实现正向迁移。实验结果表明,PACOH-RL在适应新动态条件方面优于基于模型的强化学习和基于模型的元强化学习基线方法。最后,我们在真实机器人小车上展示了其在数据稀缺的多样化环境中实现高效强化学习策略适应的潜力。