Data leakage is a critical issue when training and evaluating any method based on supervised learning. The state-of-the-art methods for online mapping are based on supervised learning and are trained predominantly using two datasets: nuScenes and Argoverse 2. These datasets revisit the same geographic locations across training, validation, and test sets. Specifically, over $80$% of nuScenes and $40$% of Argoverse 2 validation and test samples are located less than $5$ m from a training sample. This allows methods to localize within a memorized implicit map during testing and leads to inflated performance numbers being reported. To reveal the true performance in unseen environments, we introduce geographical splits of the data. Experimental results show significantly lower performance numbers, for some methods dropping with more than $45$ mAP, when retraining and reevaluating existing online mapping models with the proposed split. Additionally, a reassessment of prior design choices reveals diverging conclusions from those based on the original split. Notably, the impact of the lifting method and the support from auxiliary tasks (e.g., depth supervision) on performance appears less substantial or follows a different trajectory than previously perceived. Geographical splits can be found https://github.com/LiljaAdam/geographical-splits
翻译:数据泄漏是基于监督学习训练和评估任何方法时的一个关键问题。当前最先进的在线地图方法依赖于监督学习,主要使用两个数据集进行训练:nuScenes和Argoverse 2。这些数据集在训练集、验证集和测试集之间重复出现相同的地理位置。具体而言,超过80%的nuScenes和40%的Argoverse 2验证集与测试集样本距离某个训练样本不到5米。这使得方法在测试时能够定位到记忆中的隐式地图,导致报告的性能数据被人为抬高。为了揭示在未见环境中的真实性能,我们引入了数据的地理划分。实验结果显示,使用所提出的划分重新训练和评估现有在线地图模型时,性能数据显著下降,某些方法的mAP降幅超过45。此外,对先前设计选择的重新评估揭示了与基于原始划分得出的结论截然不同的见解。值得注意的是,提升方法及辅助任务(如深度监督)对性能的影响似乎不如先前所认为的那样显著,或者遵循不同的变化轨迹。地理划分可访问https://github.com/LiljaAdam/geographical-splits获取。