Background: The development of AI-enabled software heavily depends on AI model documentation, such as model cards, due to different domain expertise between software engineers and model developers. From an ethical standpoint, AI model documentation conveys critical information on ethical considerations along with mitigation strategies for downstream developers to ensure the delivery of ethically compliant software. However, knowledge on such documentation practice remains scarce. Aims: The objective of our study is to investigate how developers document ethical aspects of open source AI models in practice, aiming at providing recommendations for future documentation endeavours. Method: We selected three sources of documentation on GitHub and Hugging Face, and developed a keyword set to identify ethics-related documents systematically. After filtering an initial set of 2,347 documents, we identified 265 relevant ones and performed thematic analysis to derive the themes of ethical considerations. Results: Six themes emerge, with the three largest ones being model behavioural risks, model use cases, and model risk mitigation. Conclusions: Our findings reveal that open source AI model documentation focuses on articulating ethical problem statements and use case restrictions. We further provide suggestions to various stakeholders for improving documentation practice regarding ethical considerations.
翻译:背景:由于软件工程师与模型开发者之间的领域专业知识差异,人工智能赋能软件的开发高度依赖于模型卡片等人工智能模型文档。从伦理角度来看,人工智能模型文档向下游开发者传递了关于伦理考量的关键信息及缓解策略,以确保交付符合伦理规范的软件。然而,关于此类文档实践的知识仍然匮乏。目的:本研究旨在探究开发者在实践中如何记录开源人工智能模型的伦理层面,以期对未来文档工作提供建议。方法:我们选取了GitHub和Hugging Face上的三类文档来源,并开发了一套关键词集以系统识别伦理相关文档。在筛选了初始的2,347份文档后,我们确定了265份相关文档,并通过主题分析归纳出伦理考量的主题类别。结果:研究共浮现出六大主题,其中规模最大的三个主题分别是模型行为风险、模型使用场景和模型风险缓解。结论:我们的发现表明,开源人工智能模型文档侧重于阐明伦理问题陈述和使用场景限制。我们进一步为各利益相关方提供了改进伦理考量文档实践的建议。