The disconnect between distributed software artifacts and their supposed source code enables attackers to leverage the build process for inserting malicious functionality. Past research in this field focuses on compiled language ecosystems, mostly analysing Linux distribution packages. However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This SoK provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems. We find that the literature is sparse, focusing on few individual problems and ecosystems. This allows us to effectively identify next steps to improve reproducibility in this field.
翻译:分布式软件制品与其假定源代码之间的脱节使得攻击者能够利用构建过程插入恶意功能。该领域既往研究主要集中于编译型语言生态系统,多聚焦于Linux发行版软件包分析。然而,鉴于分布式制品在系统层面的差异,流行的脚本语言生态系统可能面临独特问题。本系统化综述旨在梳理现有研究脉络,明确未来研究方向,并探讨从编译型语言生态系统迁移现有知识的可能性。为此,我们提炼当前研究的关键维度,系统化归纳软件可复现性面临的已知挑战,并在不同生态系统间建立映射关系。研究发现现有文献较为零散,仅聚焦于少数孤立问题及特定生态系统。这使我们能够有效识别提升该领域可复现性的关键路径。