Reinforcement Learning (RL) agents often struggle in real-world applications where environmental conditions are non-stationary, particularly when reward functions shift or the available action space expands. This paper introduces MORPHIN, a self-adaptive Q-learning framework that enables on-the-fly adaptation without full retraining. By integrating concept drift detection with dynamic adjustments to learning and exploration hyperparameters, MORPHIN adapts agents to changes in both the reward function and on-the-fly expansions of the agent's action space, while preserving prior policy knowledge to prevent catastrophic forgetting. We validate our approach using a Gridworld benchmark and a traffic signal control simulation. The results demonstrate that MORPHIN achieves superior convergence speed and continuous adaptation compared to a standard Q-learning baseline, improving learning efficiency by up to 1.7x.
翻译:强化学习(RL)智能体在现实应用中常面临环境条件非平稳的挑战,尤其是在奖励函数发生偏移或可用动作空间扩展的情况下。本文提出了MORPHIN,一种自适应的Q学习框架,能够实现无需完全重新训练的即时适应。通过将概念漂移检测与学习和探索超参数的动态调整相结合,MORPHIN使智能体能够适应奖励函数的变化以及智能体动作空间的即时扩展,同时保留先前的策略知识以防止灾难性遗忘。我们使用Gridworld基准测试和交通信号控制仿真验证了所提方法。结果表明,与标准Q学习基线相比,MORPHIN在收敛速度和持续适应方面表现更优,学习效率最高提升至1.7倍。