In this work, we study out-of-distribution (OOD) generalization in meta-reinforcement learning from an information-theoretic perspective. We begin by establishing OOD generalization bounds for meta-supervised learning under two distinct distribution shift scenarios: standard distribution mismatch and a broad-to-narrow training setting. Building on this foundation, we formalize the generalization problem in meta-reinforcement learning and establish fine-grained generalization bounds that exploit the structure of Markov Decision Processes. Lastly, we analyze the generalization performance of a gradient-based meta-reinforcement learning algorithm.
翻译:本文从信息论视角研究元强化学习中的分布外(OOD)泛化问题。首先,我们针对两种不同的分布偏移场景——标准分布失配和从宽泛到狭窄的训练设置——建立了元监督学习的OOD泛化界。在此基础上,我们形式化了元强化学习中的泛化问题,并利用马尔可夫决策过程的结构特性建立了细粒度的泛化界。最后,我们分析了基于梯度的元强化学习算法的泛化性能。