Do Software Security Practices Yield Fewer Vulnerabilities?

Due to the ever-increasing security breaches, practitioners are motivated to produce more secure software. In the United States, the White House Office released a memorandum on Executive Order (EO) 14028 that mandates organizations provide self-attestation of the use of secure software development practices. The OpenSSF Scorecard project allows practitioners to measure the use of software security practices automatically. However, little research has been done to determine whether the use of security practices improves package security, particularly which security practices have the biggest impact on security outcomes. The goal of this study is to assist practitioners and researchers making informed decisions on which security practices to adopt through the development of models between software security practice scores and security vulnerability counts. To that end, we developed five supervised machine learning models for npm and PyPI packages using the OpenSSF Scorecared security practices scores and aggregate security scores as predictors and the number of externally-reported vulnerabilities as a target variable. Our models found four security practices (Maintained, Code Review, Branch Protection, and Security Policy) were the most important practices influencing vulnerability count. However, we had low R^2 (ranging from 9% to 12%) when we tested the models to predict vulnerability counts. Additionally, we observed that the number of reported vulnerabilities increased rather than reduced as the aggregate security score of the packages increased. Both findings indicate that additional factors may influence the package vulnerability count. We suggest that vulnerability count and security score data be refined such that these measures may be used to provide actionable guidance on security practices.

翻译：由于安全漏洞事件持续增加，从业者被激励开发更安全的软件。美国白宫办公厅发布了关于第14028号行政命令的备忘录，要求各组织提供自我证明，表明其使用了安全的软件开发实践。OpenSSF Scorecard项目允许从业者自动衡量软件安全实践的使用情况。然而，目前鲜有研究探讨安全实践的使用是否能改善软件包的安全性，尤其是哪些安全实践对安全结果影响最大。本研究旨在通过开发软件安全实践评分与安全漏洞数量之间的模型，帮助从业者和研究人员就采用哪些安全实践做出明智决策。为此，我们利用OpenSSF Scorecard的安全实践评分和综合安全评分作为预测变量，以外部报告的漏洞数量作为目标变量，为npm和PyPI软件包构建了五个监督式机器学习模型。我们的模型发现，有四种安全实践（维护状态、代码审查、分支保护和安全策略）是影响漏洞数量的最重要因素。然而，在测试模型预测漏洞数量时，R²值较低（介于9%至12%之间）。此外，我们观察到，随着软件包综合安全评分的提高，报告漏洞的数量不降反升。这两个发现表明，可能还有其他因素影响软件包的漏洞数量。我们建议对漏洞数量和安全评分数据进行细化，以便这些指标能够为安全实践提供可操作的指导。