As open-source AI software projects become an integral component in the AI software development, it is critical to develop a novel methods to ensure and measure the security of the open-source projects for developers. Code ownership, pivotal in the evolution of such projects, offers insights into developer engagement and potential vulnerabilities. In this paper, we leverage the code ownership metrics to empirically investigate the correlation with the latent vulnerabilities across five prominent open-source AI software projects. The findings from the large-scale empirical study suggest a positive relationship between high-level ownership (characterised by a limited number of minor contributors) and a decrease in vulnerabilities. Furthermore, we innovatively introduce the time metrics, anchored on the project's duration, individual source code file timelines, and the count of impacted releases. These metrics adeptly categorise distinct phases of open-source AI software projects and their respective vulnerability intensities. With these novel code ownership metrics, we have implemented a Python-based command-line application to aid project curators and quality assurance professionals in evaluating and benchmarking their on-site projects. We anticipate this work will embark a continuous research development for securing and measuring open-source AI project security.
翻译:随着开源AI软件项目成为AI软件开发中不可或缺的组成部分,开发针对这些开源项目的安全评估与度量方法显得至关重要。代码所有权作为此类项目演进的关键要素,能够揭示开发者参与度与潜在漏洞之间的关联。本文利用代码所有权度量指标,对五个主流开源AI软件项目中隐藏漏洞的相关性进行了实证研究。大规模实证研究结果表明,高水平所有权(以少数次要贡献者为特征)与漏洞减少之间存在正相关关系。此外,我们创新性地引入了基于项目持续时间、单个源文件时间线及受影响版本数量的时间度量指标。这些指标能精准划分开源AI软件项目的不同阶段及其对应的漏洞强度。依托这些新颖的代码所有权度量指标,我们开发了基于Python的命令行应用程序,用于辅助项目管理员与质量保障专业人员评估及基准测试其现场项目。我们预期此项工作将开启关于开源AI项目安全保障与度量领域的持续性研究。