This paper addresses the often overlooked issue of fairness in the autonomous driving domain, particularly in vision-based perception and prediction systems, which play a pivotal role in the overall functioning of Autonomous Vehicles (AVs). We focus our analysis on biases present in some of the most commonly used visual datasets for training person and vehicle detection systems. We introduce an annotation methodology and a specialised annotation tool, both designed to annotate protected attributes of agents in visual datasets. We validate our methodology through an inter-rater agreement analysis and provide the distribution of attributes across all datasets. These include annotations for the attributes age, sex, skin tone, group, and means of transport for more than 90K people, as well as vehicle type, colour, and car type for over 50K vehicles. Generally, diversity is very low for most attributes, with some groups, such as children, wheelchair users, or personal mobility vehicle users, being extremely underrepresented in the analysed datasets. The study contributes significantly to efforts to consider fairness in the evaluation of perception and prediction systems for AVs. This paper follows reproducibility principles. The annotation tool, scripts and the annotated attributes can be accessed publicly at https://github.com/ec-jrc/humaint_annotator.
翻译:本文探讨了自动驾驶领域中常被忽视的公平性问题,尤其聚焦于基于视觉的感知与预测系统——这些系统在自动驾驶汽车(AVs)的整体运行中发挥着关键作用。我们重点分析了当前用于训练行人与车辆检测系统的常用视觉数据集中所存在的偏差。为此,我们提出了一套标注方法及专用标注工具,用于对视觉数据集中智能体的受保护属性进行标注。通过评分者间一致性分析验证了方法的有效性,并提供了所有数据集中各属性的分布情况。这些标注覆盖了90,000余人的年龄、性别、肤色、群体及出行方式属性,以及50,000余辆车的车型、颜色与车种信息。总体而言,多数属性的多样性水平极低,其中儿童、轮椅使用者及个人代步工具使用者等群体在已分析数据集中严重缺乏代表性。本研究为推进自动驾驶系统感知与预测任务中的公平性评估提供了重要支撑。本文遵循可复现性原则,标注工具、脚本及标注属性已公开于https://github.com/ec-jrc/humaint_annotator。