In autonomous driving, the end-to-end (E2E) driving approach that predicts vehicle control signals directly from sensor data is rapidly gaining attention. To learn a safe E2E driving system, one needs an extensive amount of driving data and human intervention. Vehicle control data is constructed by many hours of human driving, and it is challenging to construct large vehicle control datasets. Often, publicly available driving datasets are collected with limited driving scenes, and collecting vehicle control data is only available by vehicle manufacturers. To address these challenges, this paper proposes the first self-supervised learning framework, self-supervised imitation learning (SSIL), that can learn E2E driving networks without using driving command data. To construct pseudo steering angle data, proposed SSIL predicts a pseudo target from the vehicle's poses at the current and previous time points that are estimated with light detection and ranging sensors. Our numerical experiments demonstrate that the proposed SSIL framework achieves comparable E2E driving accuracy with the supervised learning counterpart. In addition, our qualitative analyses using a conventional visual explanation tool show that trained NNs by proposed SSIL and the supervision counterpart attend similar objects in making predictions.
翻译:在自动驾驶领域,直接根据传感器数据预测车辆控制信号的端到端(E2E)驾驶方法正迅速受到关注。要学习安全的E2E驾驶系统,需要大量驾驶数据和人工干预。车辆控制数据需通过数小时的人类驾驶构建,而构建大型车辆控制数据集极具挑战性。通常,公开的驾驶数据集仅包含有限的驾驶场景,且车辆控制数据的收集仅能由车辆制造商完成。为解决上述挑战,本文首次提出一种自监督学习框架——自监督模仿学习(SSIL),其无需使用驾驶指令数据即可学习E2E驾驶网络。为构建伪转向角数据,所提出的SSIL方法利用激光雷达传感器估计的车辆当前时刻与前一时刻位姿,预测伪目标值。数值实验表明,所提SSIL框架在E2E驾驶精度上可与监督学习方法相媲美。此外,采用传统可视化解释工具的定性分析显示,经所提SSIL和监督方法训练得到的神经网络在预测时关注相似的物体区域。