This paper conducts fairness testing on automated pedestrian detection, a crucial but under-explored issue in autonomous driving systems. We evaluate eight widely-studied pedestrian detectors across demographic groups on large-scale real-world datasets. To enable thorough fairness testing, we provide extensive annotations for the datasets, resulting in 8,311 images with 16,070 gender labels, 20,115 age labels, and 3,513 skin tone labels. Our findings reveal significant fairness issues related to age and skin tone. The detection accuracy for adults is 19.67% higher compared to children, and there is a 7.52% accuracy disparity between light-skin and dark-skin individuals. Gender, however, shows only a 1.1% difference in detection accuracy. Additionally, we investigate common scenarios explored in the literature on autonomous driving testing, and find that the bias towards dark-skin pedestrians increases significantly under scenarios of low contrast and low brightness. We publicly release the code, data, and results to support future research on fairness in autonomous driving.
翻译:本文对自动行人检测进行公平性测试,这是自动驾驶系统中一个关键但尚未充分探索的问题。我们在大规模真实世界数据集上,针对不同人口群体评估了八种广泛研究的行人检测器。为进行全面的公平性测试,我们为数据集提供了大量标注,最终得到8,311张图像,包含16,070个性别标签、20,115个年龄标签和3,513个肤色标签。研究结果揭示了与年龄和肤色相关的显著公平性问题。成年人的检测准确率比儿童高19.67%,而浅肤色与深肤色个体之间的准确率差异达7.52%。然而,性别仅导致检测准确率存在1.1%的差异。此外,我们探究了自动驾驶测试文献中常见的场景,发现低对比度和低亮度条件下,对深肤色行人的偏见显著增加。我们公开发布代码、数据和结果,以支持自动驾驶领域公平性的未来研究。