Levels of Binary Equivalence for the Comparison of Binaries from Alternative Builds

In response to challenges in software supply chain security, several organisations have created infrastructures to independently build commodity open source projects and release the resulting binaries. Build platform variability can strengthen security as it facilitates the detection of compromised build environments. Furthermore, by improving the security posture of the build platform and collecting provenance information during the build, the resulting artifacts can be used with greater trust. Such offerings are now available from Google, Oracle and RedHat. The availability of multiple binaries built from the same sources creates new challenges and opportunities, and raises questions such as: 'Does build A confirm the integrity of build B?' or 'Can build A reveal a compromised build B?'. To answer such questions requires a notion of equivalence between binaries. We demonstrate that the obvious approach based on bitwise equality has significant shortcomings in practice, and that there is value in opting for alternative notions. We conceptualise this by introducing levels of equivalence, inspired by clone detection types. We demonstrate the value of these new levels through several experiments. We construct a dataset consisting of Java binaries built from the same sources independently by different providers, resulting in 14,156 pairs of binaries in total. We then compare the compiled class files in those jar files and find that for 3,750 pairs of jars (26.49%) there is at least one such file that is different, also forcing the jar files and their cryptographic hashes to be different. However, based on the new equivalence levels, we can still establish that many of them are practically equivalent. We evaluate several candidate equivalence relations on a semi-synthetic dataset that provides oracles consisting of pairs of binaries that either should be, or must not be equivalent.

翻译：为应对软件供应链安全挑战，多家机构已建立独立构建通用开源项目并发布生成二进制文件的基础设施。构建平台的可变性能够增强安全性，因其有助于检测遭破坏的构建环境。此外，通过改善构建平台的安全态势并在构建过程中收集溯源信息，生成的可执行文件可获得更高信任度。此类服务现已由谷歌、甲骨文和红帽公司提供。同一源码生成多个二进制文件的可用性带来了新挑战与机遇，并引发诸如"构建A是否确认构建B的完整性？"或"构建A能否揭示构建B遭破坏？"等问题。回答这些问题需要建立二进制文件间的等价性概念。我们证明基于比特级相等的传统方法在实践中存在显著缺陷，而采用替代性等价概念具有重要价值。受克隆检测类型启发，我们通过引入等价性层级对此进行概念化。通过多项实验验证了这些新层级的价值。我们构建了由不同供应商独立构建同源Java二进制文件组成的数据集，共计14,156对二进制文件。通过比较这些jar文件中编译的类文件，发现在3,750对jar文件（26.49%）中至少存在一个差异文件，这也导致jar文件及其密码学哈希值必然不同。然而基于新的等价性层级，我们仍可判定其中多数文件具有实际等价性。我们在半合成数据集上评估了若干候选等价关系，该数据集提供了由应等价或必不等价的二进制文件对组成的判定基准。