Human-centric computer vision (HCCV) data curation practices often neglect privacy and bias concerns, leading to dataset retractions and unfair models. Further, HCCV datasets constructed through nonconsensual web scraping lack the necessary metadata for comprehensive fairness and robustness evaluations. Current remedies address issues post hoc, lack persuasive justification for adoption, or fail to provide proper contextualization for appropriate application. Our research focuses on proactive, domain-specific recommendations for curating HCCV datasets, addressing privacy and bias. We adopt an ante hoc reflective perspective and draw from current practices and guidelines, guided by the ethical framework of principlism.
翻译:以人为中心的计算机视觉(HCCV)数据策展实践常忽视隐私与偏见问题,导致数据集被撤回及模型不公平。此外,通过非知情同意网络爬取构建的HCCV数据集缺乏用于全面公平性与鲁棒性评估的必要元数据。现有补救措施多采用事后处理方式,缺乏采纳的说服力依据,或未能为适当应用提供恰当的情境化指导。本研究聚焦于针对HCCV数据集策展的主动式、领域特定建议,旨在解决隐私与偏见问题。我们采用事前反思性视角,借鉴当前实践与指南,并以原则主义伦理框架为指导。