Previous studies showed that image datasets lacking geographic diversity can lead to biased performance in models trained on them. While earlier work studied general-purpose image datasets (e.g., ImageNet) and simple tasks like image recognition, we investigated geo-biases in real-world driving datasets on a more complex task: instance segmentation. We examined if instance segmentation models trained on European driving scenes (Eurocentric models) are geo-biased. Consistent with previous work, we found that Eurocentric models were geo-biased. Interestingly, we found that geo-biases came from classification errors rather than localization errors, with classification errors alone contributing 10-90% of the geo-biases in segmentation and 19-88% of the geo-biases in detection. This showed that while classification is geo-biased, localization (including detection and segmentation) is geographically robust. Our findings show that in region-specific models (e.g., Eurocentric models), geo-biases from classification errors can be significantly mitigated by using coarser classes (e.g., grouping car, bus, and truck as 4-wheeler).
翻译:先前的研究表明,缺乏地理多样性的图像数据集可能导致基于其训练的模型产生有偏的性能。尽管早期工作主要针对通用图像数据集(如ImageNet)和简单任务(如图像识别)进行研究,但我们在更复杂的实例分割任务上探究了真实世界驾驶数据集中的地理偏见。我们检验了基于欧洲驾驶场景训练的实例分割模型(欧洲中心模型)是否存在地理偏见。与先前研究一致,我们发现欧洲中心模型确实存在地理偏见。有趣的是,我们发现地理偏见主要源于分类误差而非定位误差——仅分类误差就贡献了分割任务中10-90%的地理偏见以及检测任务中19-88%的地理偏见。这表明虽然分类任务存在地理偏见,但定位任务(包括检测与分割)在地理上具有鲁棒性。我们的研究结果表明,在区域特定模型(如欧洲中心模型)中,通过使用更粗粒度的类别划分(例如将汽车、公交车和卡车合并为四轮车辆类别),可显著缓解由分类误差引起的地理偏见。