Continual reinforcement learning (CRL) requires agents to learn from a sequence of tasks without forgetting previously acquired policies. In this work, we introduce a novel benchmark suite for CRL based on realistically simulated robots in the Gazebo simulator. Our Continual Robotic Simulation Suite (CRoSS) benchmarks rely on two robotic platforms: a two-wheeled differential-drive robot with lidar, camera and bumper sensor, and a robotic arm with seven joints. The former represent an agent in line-following and object-pushing scenarios, where variation of visual and structural parameters yields a large number of distinct tasks, whereas the latter is used in two goal-reaching scenarios with high-level cartesian hand position control (modeled after the Continual World benchmark), and low-level control based on joint angles. For the robotic arm benchmarks, we provide additional kinematics-only variants that bypass the need for physical simulation (as long as no sensor readings are required), and which can be run two orders of magnitude faster. CRoSS is designed to be easily extensible and enables controlled studies of continual reinforcement learning in robotic settings with high physical realism, and in particular allow the use of almost arbitrary simulated sensors. To ensure reproducibility and ease of use, we provide a containerized setup (Apptainer) that runs out-of-the-box, and report performances of standard RL algorithms, including Deep Q-Networks (DQN) and policy gradient methods. This highlights the suitability as a scalable and reproducible benchmark for CRL research.
翻译:持续强化学习(CRL)要求智能体能够从一系列任务中学习,同时不遗忘先前习得的策略。本研究基于Gazebo仿真器中真实模拟的机器人,提出了一种新颖的CRL基准测试套件。我们的持续机器人仿真套件(CRoSS)基准测试依托于两种机器人平台:一种配备激光雷达、摄像头和碰撞传感器的两轮差速驱动机器人,以及一种具有七个关节的机械臂。前者代表在循线和物体推动场景中的智能体,通过视觉与结构参数的变化可生成大量不同任务;后者则用于两种目标到达场景——一种是基于笛卡尔手部位置的高层控制(仿照Continual World基准设计),另一种是基于关节角度的底层控制。针对机械臂基准测试,我们还提供了仅包含运动学的变体版本,该版本无需物理仿真(只要不依赖传感器读数),运行速度可提升两个数量级。CRoSS设计为易于扩展,支持在具有高物理真实度的机器人场景中对持续强化学习进行受控研究,尤其允许使用几乎任意类型的模拟传感器。为确保可复现性和易用性,我们提供了开箱即用的容器化设置(Apptainer),并报告了包括深度Q网络(DQN)和策略梯度方法在内的标准强化学习算法的性能表现。这凸显了其作为CRL研究可扩展、可复现基准测试的适用性。