We present a new algorithm, Cross-Episodic Curriculum (CEC), to boost the learning efficiency and generalization of Transformer agents. Central to CEC is the placement of cross-episodic experiences into a Transformer's context, which forms the basis of a curriculum. By sequentially structuring online learning trials and mixed-quality demonstrations, CEC constructs curricula that encapsulate learning progression and proficiency increase across episodes. Such synergy combined with the potent pattern recognition capabilities of Transformer models delivers a powerful cross-episodic attention mechanism. The effectiveness of CEC is demonstrated under two representative scenarios: one involving multi-task reinforcement learning with discrete control, such as in DeepMind Lab, where the curriculum captures the learning progression in both individual and progressively complex settings; and the other involving imitation learning with mixed-quality data for continuous control, as seen in RoboMimic, where the curriculum captures the improvement in demonstrators' expertise. In all instances, policies resulting from CEC exhibit superior performance and strong generalization. Code is open-sourced at https://cec-agent.github.io/ to facilitate research on Transformer agent learning.
翻译:我们提出了一种新算法——跨回合课程(CEC),旨在提升Transformer智能体的学习效率与泛化能力。CEC的核心是将跨回合经验整合到Transformer的上下文中,以此构建课程基础。通过顺序组织在线学习试验与混合质量演示,CEC构建了能够封装跨回合学习进程与能力提升的课程体系。这种协同作用与Transformer模型强大的模式识别能力相结合,催生出高效的跨回合注意力机制。CEC的有效性在两种代表性场景中得到验证:其一涉及离散控制的多任务强化学习(如DeepMind Lab),此时课程能捕捉个体及渐进复杂情境中的学习进程;其二涉及面向连续控制的混合质量数据模仿学习(如RoboMimic),此时课程能捕捉演示者专业能力的提升。在全部实例中,CEC产生的策略均展现出卓越性能与强泛化能力。相关代码已开源至https://cec-agent.github.io/,以促进Transformer智能体学习的研究。