Deep Neural Networks (DNNs) are being adopted as components in software systems. Creating and specializing DNNs from scratch has grown increasingly difficult as state-of-the-art architectures grow more complex. Following the path of traditional software engineering, machine learning engineers have begun to reuse large-scale pre-trained models (PTMs) and fine-tune these models for downstream tasks. Prior works have studied reuse practices for traditional software packages to guide software engineers towards better package maintenance and dependency management. We lack a similar foundation of knowledge to guide behaviors in pre-trained model ecosystems. In this work, we present the first empirical investigation of PTM reuse. We interviewed 12 practitioners from the most popular PTM ecosystem, Hugging Face, to learn the practices and challenges of PTM reuse. From this data, we model the decision-making process for PTM reuse. Based on the identified practices, we describe useful attributes for model reuse, including provenance, reproducibility, and portability. Three challenges for PTM reuse are missing attributes, discrepancies between claimed and actual performance, and model risks. We substantiate these identified challenges with systematic measurements in the Hugging Face ecosystem. Our work informs future directions on optimizing deep learning ecosystems by automated measuring useful attributes and potential attacks, and envision future research on infrastructure and standardization for model registries.
翻译:深度神经网络(DNNs)正被广泛用作软件系统中的组件。随着最先进的架构日益复杂,从零开始创建并特化DNNs的难度持续增加。遵循传统软件工程的发展路径,机器学习工程师已开始复用大规模预训练模型(PTMs),并针对下游任务对这些模型进行微调。以往研究探讨了传统软件包的复用实践,旨在指导软件工程师优化包维护与依赖管理。然而,针对预训练模型生态系统中的行为指导,我们尚缺乏类似的知识基础。本研究首次对PTM复用进行了实证调查。我们采访了来自最流行的PTM生态系统——Hugging Face的12位从业者,以了解PTM复用的实践与挑战。基于访谈数据,我们构建了PTM复用的决策过程模型。根据已识别的实践,我们总结了模型复用的有用属性,包括来源可溯性、可复现性和可移植性。PTM复用面临的三大挑战包括属性缺失、声称性能与实际性能的差异以及模型风险。我们通过在Hugging Face生态系统中的系统测量验证了这些挑战。本研究通过自动测量有用属性与潜在攻击,为优化深度学习生态系统指明了未来方向,并展望了模型仓库基础设施与标准化的后续研究。