Model-free learning-based control methods have recently shown significant advantages over traditional control methods in avoiding complex vehicle characteristic estimation and parameter tuning. As a primary policy learning method, imitation learning (IL) is capable of learning control policies directly from expert demonstrations. However, the performance of IL policies is highly dependent on the data sufficiency and quality of the demonstrations. To alleviate the above problems of IL-based policies, a lifelong policy learning (LLPL) framework is proposed in this paper, which extends the IL scheme with lifelong learning (LLL). First, a novel IL-based model-free control policy learning method for path tracking is introduced. Even with imperfect demonstration, the optimal control policy can be learned directly from historical driving data. Second, by using the LLL method, the pre-trained IL policy can be safely updated and fine-tuned with incremental execution knowledge. Third, a knowledge evaluation method for policy learning is introduced to avoid learning redundant or inferior knowledge, thus ensuring the performance improvement of online policy learning. Experiments are conducted using a high-fidelity vehicle dynamic model in various scenarios to evaluate the performance of the proposed method. The results show that the proposed LLPL framework can continuously improve the policy performance with collected incremental driving data, and achieves the best accuracy and control smoothness compared to other baseline methods after evolving on a 7 km curved road. Through learning and evaluation with noisy real-life data collected in an off-road environment, the proposed LLPL framework also demonstrates its applicability in learning and evolving in real-life scenarios.
翻译:基于无模型的强化学习方法近年来在避免复杂车辆特征估计和参数调优方面展现出相比传统控制方法的显著优势。作为主要的策略学习方法,模仿学习能够直接从专家示范中学习控制策略。然而,模仿学习策略的性能高度依赖于演示数据的充分性和质量。为缓解上述基于模仿学习策略存在的问题,本文提出了一种终身策略学习框架,该框架通过引入终身学习对模仿学习范式进行扩展。首先,提出了一种新颖的基于模仿学习的路径跟踪无模型控制策略学习方法。即使在不完美演示条件下,也能直接从历史驾驶数据中学习最优控制策略。其次,通过终身学习方法,预训练的模仿学习策略可利用增量执行知识实现安全更新与微调。第三,引入面向策略学习的知识评估方法,避免冗余或劣质知识的学习,从而保证在线策略学习的性能提升。采用高保真车辆动力学模型在多种场景下进行实验以评估所提方法性能。结果表明,所提出的终身策略学习框架能够利用收集的增量驾驶数据持续提升策略性能,在7公里弯曲道路上进化后,相比其他基线方法实现了最优精度与控制平滑性。通过在越野环境中收集的含噪声真实数据的学习与评估,所提终身策略学习框架还证明了其在真实场景中学习与进化的适用性。