Dataset documentation is widely recognized as essential for the responsible development of automated systems. Despite growing efforts to support documentation through different kinds of artifacts, little is known about the motivations shaping documentation tool design or the factors hindering their adoption. We present a systematic review supported by mixed-methods analysis of 59 dataset documentation publications to examine the motivations behind building documentation tools, how authors conceptualize documentation practices, and how these tools connect to existing systems, regulations, and cultural norms. Our analysis shows four persistent patterns in dataset documentation conceptualization that potentially impede adoption and standardization: unclear operationalizations of documentation's value, decontextualized designs, unaddressed labor demands, and a tendency to treat integration as future work. Building on these findings, we propose a shift in Responsible AI tool design toward institutional rather than individual solutions, and outline actions the HCI community can take to enable sustainable documentation practices.
翻译:数据集文档被广泛认为是负责任开发自动化系统的关键。尽管通过各类制品支持文档化的努力日益增多,但人们对影响文档工具设计的动机或阻碍其采用的因素仍知之甚少。我们通过对59篇数据集文档文献进行混合方法分析,提出一项系统性综述,旨在考察构建文档工具的动机、作者如何概念化文档实践,以及这些工具如何与现有系统、法规和文化规范相连接。我们的分析揭示了数据集文档概念化中四个可能阻碍采用和标准化的持续模式:文档价值的不明确操作化、脱离语境的设计、未解决的劳动需求,以及倾向于将集成视为未来工作的趋势。基于这些发现,我们建议将负责任人工智能工具设计的重点从个体解决方案转向制度性解决方案,并概述了人机交互社区为促进可持续文档实践可采取的行动。