TSN-Affinity: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning

Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously learned tasks. This setting corresponds to domains where new tasks arise over time, but adapting the model in live environment interactions is expensive, risky, or impossible. However, CORL inherits the dual difficulty of offline reinforcement learning and adapting while preventing catastrophic forgetting. Replay-based continual learning approaches remain a strong baseline but incur memory overhead and suffer from a distribution mismatch between replayed samples and newly learned policies. At the same time, architectural continual learning methods have shown strong potential in supervised learning but remain underexplored in CORL. In this work, we propose TSN-Affinity, a novel CORL method based on TinySubNetworks and Decision Transformer. The method enables task-specific parameterization and controlled knowledge sharing through a RL-aware reuse strategy that routes tasks according to action compatibility and latent similarity. We evaluate the approach on benchmarks based on Atari games and simulations of manipulation tasks with the Franka Emika Panda robotic arm, covering both discrete and continuous control. Results show strong retention from sparse SubNetworks, with routing further improving multi-task performance. Our findings suggest that similarity-guided architectural reuse is a strong and viable alternative to replay-based strategies in a CORL setting. Our code is available at: https://github.com/anonymized-for-submission123/tsn-affinity.

翻译：持续离线强化学习（CORL）旨在从随时间收集的数据集中学习一系列任务，同时保持对已学习任务的性能。该场景适用于新任务随时间涌现，但在实时环境交互中调整模型代价高昂、风险较大或不可行的领域。然而，CORL继承了离线强化学习与适应性调整的双重难题，同时需防止灾难性遗忘。基于回放的持续学习方法虽仍是强基线，但存在内存开销，且回放样本与新学习策略之间存在分布不匹配。与此同时，基于架构的持续学习方法在监督学习中展现出强大潜力，但在CORL中尚未充分探索。本文提出TSN-Affinity——一种基于TinySubNetworks和Decision Transformer的新型CORL方法。该方法通过基于动作兼容性与潜在相似性的RL感知复用策略，实现任务特异性参数化与受控知识共享。我们在基于Atari游戏与Franka Emika Panda机械臂操作任务模拟的基准测试上评估了该方法，覆盖离散与连续控制场景。结果表明，稀疏子网络具有强大的记忆保持能力，而路由机制进一步提升了多任务性能。我们的发现表明，在CORL场景中，相似性引导的架构复用是回放策略的有力且可行的替代方案。代码已开源：https://github.com/anonymized-for-submission123/tsn-affinity。

相关内容

CoRL

关注 0

CoRL的全程为Conference on Robot Learning（机器人学习大会），CoRL是一个新的以机器人学和机器学习为主题的年度国际会议。大会的组织者包括来自UC Berkrley、Google、Microsoft、CMU、MIT、ETH、Deepmind等知名院校和知名企业的研究者和从业者，同时CoRL大会的举办还得到了机器人国际机构“三巨头”之一的国际机器人研究基金会（IFRR）和机器学习领域最好的期刊之一JMLR（Journal of Machine Learning Research）的支持。

ICML2026 | LAVL：离线目标条件强化学习中的潜在表示对齐

专知会员服务

8+阅读 · 5月26日

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

专知会员服务

10+阅读 · 5月5日

【ICML2025】通过在线世界模型规划的持续强化学习

专知会员服务

20+阅读 · 2025年7月18日

《强化学习的应用及其在战争战术模拟技术中的扩展》

专知会员服务

28+阅读 · 2025年1月14日