In this work, we tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images. Mainly, we investigate the question: what would be good road scene-level representations for these two tasks? We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle while performing actions to their destinations. To this end, we introduce the representation of semantic regions, which are areas where ego-vehicles visit while taking an afforded action (e.g., left-turn at 4-way intersections). We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm. Extensive evaluations are conducted on the HDD and nuScenes datasets, and the learned representations lead to state-of-the-art performance for driver intention prediction and risk object identification.
翻译:本文研究了自动驾驶系统中的两个关键任务,即从自车视角图像中预测驾驶员意图和识别风险物体。主要探讨的问题是:针对这两个任务,良好的道路场景级表示应具备何种特征?我们认为,场景级表示必须捕捉自车执行行驶动作时周围交通场景的高层语义与几何表征。为此,我们引入语义区域表示——即自车在实施可行动作(如四向交叉口左转)时经过的区域。我们提出通过新颖的语义区域预测任务和自动语义区域标注算法来学习场景级表示。在HDD和nuScenes数据集上的大量评估表明,所学表示在驾驶员意图预测和风险物体识别任务中均达到最优性能。