Deriving robust control policies for realistic urban navigation scenarios is not a trivial task. In an end-to-end approach, these policies must map high-dimensional images from the vehicle's cameras to low-level actions such as steering and throttle. While pure Reinforcement Learning (RL) approaches are based exclusively on rewards,Generative Adversarial Imitation Learning (GAIL) agents learn from expert demonstrations while interacting with the environment, which favors GAIL on tasks for which a reward signal is difficult to derive. In this work, the hGAIL architecture was proposed to solve the autonomous navigation of a vehicle in an end-to-end approach, mapping sensory perceptions directly to low-level actions, while simultaneously learning mid-level input representations of the agent's environment. The proposed hGAIL consists of an hierarchical Adversarial Imitation Learning architecture composed of two main modules: the GAN (Generative Adversarial Nets) which generates the Bird's-Eye View (BEV) representation mainly from the images of three frontal cameras of the vehicle, and the GAIL which learns to control the vehicle based mainly on the BEV predictions from the GAN as input.Our experiments have shown that GAIL exclusively from cameras (without BEV) fails to even learn the task, while hGAIL, after training, was able to autonomously navigate successfully in all intersections of the city.
翻译:推导用于真实城市导航场景的鲁棒控制策略并非易事。在端到端方法中,这些策略必须将车辆摄像头捕捉的高维图像映射到方向盘、油门等低层级动作。纯强化学习方法完全依赖奖励信号,而生成对抗模仿学习智能体在与环境交互的同时从专家示范中学习,这使得GAIL更适合难以推导奖励信号的任务。本工作提出hGAIL架构,以端到端方式解决车辆自主导航问题,将感官感知直接映射为低层级动作,同时学习智能体环境的中层输入表征。所提出的hGAIL由层级式对抗模仿学习架构组成,包含两个主要模块:生成对抗网络(GAN)——从车辆三个前视摄像头图像生成鸟瞰图(BEV)表征,以及GAIL——主要基于GAN生成的BEV预测学习车辆控制。实验表明,仅使用摄像头输入(无BEV)的GAIL甚至无法学习基本任务,而训练后的hGAIL能在城市所有交叉路口成功实现自主导航。