Prior work in 3D object detection evaluates models using offline metrics like average precision since closed-loop online evaluation on the downstream driving task is costly. However, it is unclear how indicative offline results are of driving performance. In this work, we perform the first empirical evaluation measuring how predictive different detection metrics are of driving performance when detectors are integrated into a full self-driving stack. We conduct extensive experiments on urban driving in the CARLA simulator using 16 object detection models. We find that the nuScenes Detection Score has a higher correlation to driving performance than the widely used average precision metric. In addition, our results call for caution on the exclusive reliance on the emerging class of `planner-centric' metrics.
翻译:先前在三维目标检测领域的研究中,由于在驾驶任务下游闭环在线评估代价高昂,研究者通常采用平均精度等离线指标对模型进行评估。然而,尚不清楚离线结果对驾驶性能的指示程度。本研究首次通过实证评估,系统衡量了不同检测指标在检测器集成至完整自动驾驶堆栈后对驾驶性能的预测能力。我们基于CARLA仿真器,利用16种目标检测模型在城市驾驶场景中开展了大量实验。结果表明,相较于广泛使用的平均精度指标,nuScenes检测评分与驾驶性能具有更高的相关性。此外,我们的研究结果警示:不应过度依赖新兴的"规划器中心"类指标。