While Reinforcement Learning ( RL) has made great strides towards solving increasingly complicated problems, many algorithms are still brittle to even slight environmental changes. Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner, thereby enabling flexible, precise and interpretable task specification and generation. Our goal is to show how the framework of cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks. We confirm the insight that optimal behavior in cRL requires context information, as in other related areas of partial observability. To empirically validate this in the cRL framework, we provide various context-extended versions of common RL environments. They are part of the first benchmark library, CARL, designed for generalization based on cRL extensions of popular benchmarks, which we propose as a testbed to further study general agents. We show that in the contextual setting, even simple RL environments become challenging - and that naive solutions are not enough to generalize across complex context spaces.
翻译:尽管强化学习(RL)在解决日益复杂的问题方面取得了长足进步,但许多算法在面对环境哪怕微小的变化时依然脆弱。上下文强化学习(cRL)提供了一种以原理性方式建模此类变化的框架,从而实现灵活、精确且可解释的任务规约与生成。我们的目标是展示cRL框架如何通过有意义的基准测试以及对泛化任务的结构化推理,助力提升强化学习中的零样本泛化能力。我们验证了这样一个洞见:与其他部分可观测性相关领域类似,cRL中的最优行为需要上下文信息。为在cRL框架内实证验证这一点,我们提供了多种常见RL环境的上下文扩展版本。这些环境是首个基于cRL扩展的泛化基准库CARL的组成部分,我们将其作为进一步研究通用智能体的测试平台。研究表明,在上下文设定下,即便简单的RL环境也会变得具有挑战性——而简单的解决方案不足以在复杂上下文空间中进行泛化。