Human-centric computer vision (HCCV) data curation practices often neglect privacy and bias concerns, leading to dataset retractions and unfair models. HCCV datasets constructed through nonconsensual web scraping lack crucial metadata for comprehensive fairness and robustness evaluations. Current remedies are post hoc, lack persuasive justification for adoption, or fail to provide proper contextualization for appropriate application. Our research focuses on proactive, domain-specific recommendations, covering purpose, privacy and consent, and diversity, for curating HCCV evaluation datasets, addressing privacy and bias concerns. We adopt an ante hoc reflective perspective, drawing from current practices, guidelines, dataset withdrawals, and audits, to inform our considerations and recommendations.
翻译:以人为本的计算机视觉(HCCV)数据整理实践常忽视隐私与偏见问题,导致数据集被撤回及模型不公平。通过非同意网络爬取构建的HCCV数据集缺乏用于全面公平性和鲁棒性评估的关键元数据。当前的补救措施是事后性的,缺乏令人信服的采纳依据,或未能为适当应用提供恰当的情境化说明。我们的研究专注于为整理HCCV评估数据集提供主动式、领域特定的建议,涵盖目的、隐私与同意、以及多样性,以解决隐私与偏见问题。我们采用事前反思的视角,借鉴当前实践、指南、数据集撤回案例及审计结果,来形成我们的考量与建议。