This work explores scene graphs as a distilled representation of high-level information for autonomous driving, applied to future driver-action prediction. Given the scarcity and strong imbalance of data samples, we propose a self-supervision pipeline to infer representative and well-separated embeddings. Key aspects are interpretability and explainability; as such, we embed in our architecture attention mechanisms that can create spatial and temporal heatmaps on the scene graphs. We evaluate our system on the ROAD dataset against a fully-supervised approach, showing the superiority of our training regime.
翻译:本文探索将场景图作为自动驾驶中高级信息的精炼表征,应用于未来驾驶员动作预测。针对数据样本稀缺且高度不平衡的问题,我们提出一种自监督流程来推断具有代表性且分离良好的嵌入表示。可解释性与可说明性是关键考量因素;因此,我们在架构中嵌入注意力机制,可生成场景图的空间与时间热力图。我们在ROAD数据集上将系统与全监督方法进行对比评估,验证了本训练机制的优越性。