Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.
翻译:尽管为创建更公平的机器学习(ML)数据集已付出大量努力,但业界对数据集策展实践层面的理解仍显不足。基于对30位ML数据集策展人员的访谈,我们提出了一个涵盖数据集策展全生命周期所遇挑战与权衡的综合性分类体系。研究结果揭示了更广泛的公平性领域中影响数据策展的全局性问题。最后,我们提出旨在推动系统性变革的建议,以更好地促进公平数据集策展实践的发展。