Rebuilding packages from open source is a common practice to improve the security of software supply chains, and is now done at an industrial scale. The basic principle is to acquire the source code used to build a package published in a repository such as Maven Central (for Java), rebuild the package independently with hardened security, and publish it in some alternative repository. In this paper we test the assumption that the same source code is being used by those alternative builds. To study this, we compare the sources released with packages on Maven Central, with the sources associated with independently built packages from Google's Assured Open Source and Oracle's Build-from-Source projects. We study non-equivalent sources for alternative builds of 28 popular packages with 85 releases. We investigate the causes of non-equivalence, and find that the main cause is build extensions that generate code at build time, which are difficult to reproduce. We suggest strategies to address this issue.
翻译:从开源代码重建软件包是提升软件供应链安全性的常见实践,目前已在工业规模上实施。其基本原理是获取用于构建已发布至代码仓库(如Java的Maven Central)的软件包源代码,在强化安全性的条件下独立重建该软件包,并将其发布至某个替代仓库。本文检验了"这些替代构建均使用相同源代码"这一假设。为此,我们比较了Maven Central上软件包发布的源代码,与谷歌Assured Open Source项目和甲骨文Build-from-Source项目中独立构建软件包所关联的源代码。我们针对28个流行软件包的85个发行版本,研究了其替代构建的非等效源代码现象。通过探究非等效性的成因,我们发现主要原因是构建时生成代码的构建扩展机制难以复现。最后,我们提出了解决此问题的策略建议。