Robust autonomous driving requires agents to accurately identify unexpected areas in urban scenes. To this end, some critical issues remain open: how to design advisable metric to measure anomalies, and how to properly generate training samples of anomaly data? Previous effort usually resorts to uncertainty estimation and sample synthesis from classification tasks, which ignore the context information and sometimes requires auxiliary datasets with fine-grained annotations. On the contrary, in this paper, we exploit the strong context-dependent nature of segmentation task and design an energy-guided self-supervised frameworks for anomaly segmentation, which optimizes an anomaly head by maximizing the likelihood of self-generated anomaly pixels. To this end, we design two estimators for anomaly likelihood estimation, one is a simple task-agnostic binary estimator and the other depicts anomaly likelihood as residual of task-oriented energy model. Based on proposed estimators, we further incorporate our framework with likelihood-guided mask refinement process to extract informative anomaly pixels for model training. We conduct extensive experiments on challenging Fishyscapes and Road Anomaly benchmarks, demonstrating that without any auxiliary data or synthetic models, our method can still achieves competitive performance to other SOTA schemes.
翻译:鲁棒的自动驾驶要求智能体准确识别城市场景中的异常区域。为此,仍存在一些关键问题待解决:如何设计合理的指标来衡量异常,以及如何恰当地生成异常数据的训练样本?先前的工作通常依赖从分类任务中获取的不确定性估计和样本合成方法,这忽略了上下文信息,且有时需要带有细粒度标注的辅助数据集。与此相反,本文利用分割任务强上下文依赖的特性,设计了一种能量引导的自监督框架用于异常分割。该框架通过最大化自生成异常像素的似然来优化异常检测头。为此,我们设计了两种异常似然估计器:一种是简单的任务无关二元估计器,另一种则将异常似然描述为面向任务的能量模型的残差。基于所提出的估计器,我们进一步将框架与似然引导的掩码细化过程相结合,从而提取信息丰富的异常像素用于模型训练。我们在具有挑战性的Fishyscapes和Road Anomaly基准数据集上进行了广泛实验,结果表明,无需任何辅助数据或合成模型,我们的方法仍能取得与其他先进技术方案相竞争的性能。