Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously learned tasks. This setting corresponds to domains where new tasks arise over time, but adapting the model in live environment interactions is expensive, risky, or impossible. However, CORL inherits the dual difficulty of offline reinforcement learning and adapting while preventing catastrophic forgetting. Replay-based continual learning approaches remain a strong baseline but incur memory overhead and suffer from a distribution mismatch between replayed samples and newly learned policies. At the same time, architectural continual learning methods have shown strong potential in supervised learning but remain underexplored in CORL. In this work, we propose TSN-Affinity, a novel CORL method based on TinySubNetworks and Decision Transformer. The method enables task-specific parameterization and controlled knowledge sharing through a RL-aware reuse strategy that routes tasks according to action compatibility and latent similarity. We evaluate the approach on benchmarks based on Atari games and simulations of manipulation tasks with the Franka Emika Panda robotic arm, covering both discrete and continuous control. Results show strong retention from sparse SubNetworks, with routing further improving multi-task performance. Our findings suggest that similarity-guided architectural reuse is a strong and viable alternative to replay-based strategies in a CORL setting. Our code is available at: https://github.com/anonymized-for-submission123/tsn-affinity.
翻译:持续离线强化学习(CORL)旨在从随时间收集的数据集中学习一系列任务,同时保持对已学习任务的性能。该场景适用于新任务随时间涌现,但在实时环境交互中调整模型代价高昂、风险较大或不可行的领域。然而,CORL继承了离线强化学习与适应性调整的双重难题,同时需防止灾难性遗忘。基于回放的持续学习方法虽仍是强基线,但存在内存开销,且回放样本与新学习策略之间存在分布不匹配。与此同时,基于架构的持续学习方法在监督学习中展现出强大潜力,但在CORL中尚未充分探索。本文提出TSN-Affinity——一种基于TinySubNetworks和Decision Transformer的新型CORL方法。该方法通过基于动作兼容性与潜在相似性的RL感知复用策略,实现任务特异性参数化与受控知识共享。我们在基于Atari游戏与Franka Emika Panda机械臂操作任务模拟的基准测试上评估了该方法,覆盖离散与连续控制场景。结果表明,稀疏子网络具有强大的记忆保持能力,而路由机制进一步提升了多任务性能。我们的发现表明,在CORL场景中,相似性引导的架构复用是回放策略的有力且可行的替代方案。代码已开源:https://github.com/anonymized-for-submission123/tsn-affinity。