What's documented in AI? Systematic Analysis of 32K AI Model Cards

The rapid proliferation of AI models has underscored the importance of thorough documentation, as it enables users to understand, trust, and effectively utilize these models in various applications. Although developers are encouraged to produce model cards, it's not clear how much information or what information these cards contain. In this study, we conduct a comprehensive analysis of 32,111 AI model documentations on Hugging Face, a leading platform for distributing and deploying AI models. Our investigation sheds light on the prevailing model card documentation practices. Most of the AI models with substantial downloads provide model cards, though the cards have uneven informativeness. We find that sections addressing environmental impact, limitations, and evaluation exhibit the lowest filled-out rates, while the training section is the most consistently filled-out. We analyze the content of each section to characterize practitioners' priorities. Interestingly, there are substantial discussions of data, sometimes with equal or even greater emphasis than the model itself. To evaluate the impact of model cards, we conducted an intervention study by adding detailed model cards to 42 popular models which had no or sparse model cards previously. We find that adding model cards is moderately correlated with an increase weekly download rates. Our study opens up a new perspective for analyzing community norms and practices for model documentation through large-scale data science and linguistics analysis.

翻译：人工智能模型的快速普及凸显了完备文档的重要性，因为文档使用户能够理解、信任并在各种应用中有效利用这些模型。尽管开发者被鼓励编写模型卡，但目前尚不清楚这些卡片包含多少信息或包含哪些信息。本研究对Hugging Face（一个领先的AI模型分发与部署平台）上的32,111份AI模型文档进行了全面分析。我们的调查揭示了当前模型卡文档实践的普遍特点：大多数下载量较大的AI模型都提供了模型卡，但这些卡片的信息量参差不齐。我们发现，环境影响、局限性和评估等部分的填写率最低，而训练部分则是填写最一致的部分。我们分析了每个部分的内容，以表征从业者的优先事项。有趣的是，数据相关的讨论占据了很大篇幅，有时甚至与模型本身讨论的篇幅相当或更多。为评估模型卡的影响，我们进行了一项干预研究，为42个之前未有或仅有少量模型卡的流行模型添加了详细的模型卡。研究发现，添加模型卡与周下载量的适度增长之间存在中等程度的相关性。本研究通过大规模数据科学和语言学分析，为分析模型文档的社区规范与实践提供了全新视角。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日