Human-centric image datasets are critical to the development of computer vision technologies. However, recent investigations have foregrounded significant ethical issues related to privacy and bias, which have resulted in the complete retraction, or modification, of several prominent datasets. Recent works have tried to reverse this trend, for example, by proposing analytical frameworks for ethically evaluating datasets, the standardization of dataset documentation and curation practices, privacy preservation methodologies, as well as tools for surfacing and mitigating representational biases. Little attention, however, has been paid to the realities of operationalizing ethical data collection. To fill this gap, we present a set of key ethical considerations and practical recommendations for collecting more ethically-minded human-centric image data. Our research directly addresses issues of privacy and bias by contributing to the research community best practices for ethical data collection, covering purpose, privacy and consent, as well as diversity. We motivate each consideration by drawing on lessons from current practices, dataset withdrawals and audits, and analytical ethical frameworks. Our research is intended to augment recent scholarship, representing an important step toward more responsible data curation practices.
翻译:以人为中心的图像数据集对于计算机视觉技术的发展至关重要。然而,近期研究揭示了与隐私和偏见相关的重大伦理问题,导致多个著名数据集被完全撤回或修改。最近的研究试图扭转这一趋势,例如通过提出评估数据集伦理性的分析框架、标准化数据集文档编制与整理实践、隐私保护方法,以及发现并减轻表征偏见的工具。然而,对于如何将伦理数据收集付诸实践的现实问题却鲜有关注。为填补这一空白,我们提出了一套关键的伦理考量与实用建议,用于收集更具伦理意识的人为中心图像数据。我们的研究直接针对隐私与偏见问题,通过为研究社区提供伦理数据收集的最佳实践,涵盖目的、隐私与同意以及多样性等方面。我们借鉴当前实践、数据集撤回与审核以及分析伦理框架中的经验教训,为每项考量提供依据。本研究旨在补充近期学术成果,标志着向更负责任的数据整理实践迈出了重要一步。