General intelligence requires quick adaption across tasks. While existing reinforcement learning (RL) methods have made progress in generalization, they typically assume only distribution changes between source and target domains. In this paper, we explore a wider range of scenarios where both the distribution and environment spaces may change. For example, in Atari games, we train agents to generalize to tasks with different levels of mode and difficulty, where there could be new state or action variables that never occurred in previous environments. To address this challenging setting, we introduce a causality-guided self-adaptive representation-based approach, called CSR, that equips the agent to generalize effectively and efficiently across a sequence of tasks with evolving dynamics. Specifically, we employ causal representation learning to characterize the latent causal variables and world models within the RL system. Such compact causal representations uncover the structural relationships among variables, enabling the agent to autonomously determine whether changes in the environment stem from distribution shifts or variations in space, and to precisely locate these changes. We then devise a three-step strategy to fine-tune the model under different scenarios accordingly. Empirical experiments show that CSR efficiently adapts to the target domains with only a few samples and outperforms state-of-the-art baselines on a wide range of scenarios, including our simulated environments, Cartpole, and Atari games.
翻译:通用智能需要跨任务的快速适应能力。尽管现有强化学习方法在泛化方面取得进展,但它们通常仅假设源域与目标域之间存在分布变化。本文探索了分布与环境空间可能同时变化的更广泛场景。例如在Atari游戏中,我们训练智能体泛化至具有不同模式与难度等级的任务,这些任务可能包含先前环境中从未出现的新状态或动作变量。为应对这一挑战性设定,我们提出一种基于因果引导自适应表征的方法(CSR),使智能体能够在动态演化的任务序列中实现高效泛化。具体而言,我们采用因果表征学习来刻画强化学习系统中的潜在因果变量与世界模型。此类紧凑的因果表征揭示了变量间的结构关系,使智能体能够自主判别环境变化源于分布偏移还是空间变异,并精确定位这些变化。随后我们设计三步策略以针对不同场景相应微调模型。实证实验表明,CSR仅需少量样本即可高效适应目标域,并在模拟环境、Cartpole及Atari游戏等广泛场景中超越现有基线方法。