Powered by deep representation learning, reinforcement learning (RL) provides an end-to-end learning framework capable of solving self-driving (SD) tasks without manual designs. However, time-varying nonstationary environments cause proficient but specialized RL policies to fail at execution time. For example, an RL-based SD policy trained under sunny days does not generalize well to rainy weather. Even though meta learning enables the RL agent to adapt to new tasks/environments, its offline operation fails to equip the agent with online adaptation ability when facing nonstationary environments. This work proposes an online meta reinforcement learning algorithm based on the \emph{conjectural online lookahead adaptation} (COLA). COLA determines the online adaptation at every step by maximizing the agent's conjecture of the future performance in a lookahead horizon. Experimental results demonstrate that under dynamically changing weather and lighting conditions, the COLA-based self-adaptive driving outperforms the baseline policies in terms of online adaptability. A demo video, source code, and appendixes are available at {\tt https://github.com/Panshark/COLA}
翻译:受深度表征学习驱动,强化学习提供了一种无需人工设计即可解决自动驾驶任务的端到端学习框架。然而,时变非平稳环境会导致训练有素但高度特化的强化学习策略在执行阶段失效。例如,在晴天环境下训练的基于强化学习的自动驾驶策略难以泛化至雨天场景。尽管元学习使强化学习智能体能够适应新任务/环境,但其离线操作机制无法赋予智能体面对非平稳环境时的在线适应能力。本文提出一种基于推测性在线前瞻自适应的在线元强化学习算法。该算法通过最大化智能体对前瞻时域内未来性能的推测,在每一步确定在线自适应策略。实验结果表明,在动态变化的天气和光照条件下,基于COLA的自适应驾驶策略在在线适应性方面优于基线策略。演示视频、源代码及附录详见{\tt https://github.com/Panshark/COLA}。