Transfer learning in deep reinforcement learning is often motivated by improved stability and reduced training cost, but it can also fail under substantial domain shift. This paper presents a controlled empirical study examining how architectural differences between Double Deep Q-Networks (DDQN) and Dueling DQN influence transfer behavior across environments. Using CartPole as a source task and LunarLander as a structurally distinct target task, we evaluate a fixed layer-wise representation transfer protocol under identical hyperparameters and training conditions, with baseline agents trained from scratch used to contextualize transfer effects. Empirical results show that DDQN consistently avoids negative transfer under the examined setup and maintains learning dynamics comparable to baseline performance in the target environment. In contrast, Dueling DQN consistently exhibits negative transfer under identical conditions, characterized by degraded rewards and unstable optimization behavior. Statistical analysis across multiple random seeds confirms a significant performance gap under transfer. These findings suggest that architectural inductive bias is strongly associated with robustness to cross-environment transfer in value-based deep reinforcement learning under the examined transfer protocol.
翻译:深度强化学习中的迁移学习通常旨在提高稳定性和降低训练成本,但在显著的领域偏移下也可能失败。本文提出一项对照实证研究,探讨Double Deep Q-Networks(DDQN)与Dueling DQN之间的架构差异如何影响跨环境的迁移行为。以CartPole作为源任务,以结构不同的LunarLander作为目标任务,我们在相同的超参数和训练条件下评估固定的分层表征迁移方案,并使用从头开始训练的基线智能体来量化迁移效果。实证结果表明,在所考察的设置下,DDQN始终能避免负迁移,并在目标环境中保持与基线性能相当的学习动态。相比之下,在相同条件下,Dueling DQN始终表现出负迁移,其特征是奖励下降和优化行为不稳定。基于多个随机种子的统计分析证实了迁移下存在显著的性能差距。这些发现表明,在所考察的迁移方案下,架构归纳偏置与基于价值的深度强化学习在跨环境迁移中的鲁棒性密切相关。