The reproducibility of software environments is a critical concern in modern software engineering, with ramifications ranging from the effectiveness of collaboration workflows to software supply chain security and scientific reproducibility. Containerization technologies like Docker address this problem by encapsulating software environments into shareable filesystem snapshots known as images. While Docker is frequently cited in the literature as a tool that enables reproducibility in theory, the extent of its guarantees and limitations in practice remains under-explored. In this work, we address this gap through two complementary approaches. First, we conduct a systematic literature review to examine how Docker is framed in scientific discourse on reproducibility and to identify documented best practices for writing Dockerfiles enabling reproducible image building. Then, we perform a large-scale empirical study of 5298 Docker builds collected from GitHub workflows. By rebuilding these images and comparing the results with their historical counterparts, we assess the real reproducibility of Docker images and evaluate the effectiveness of the best practices identified in the literature.
翻译:软件环境的可复现性是现代软件工程中的关键问题,其影响范围涵盖协作工作流的有效性、软件供应链安全以及科学研究的可复现性。以 Docker 为代表的容器化技术通过将软件环境封装成可共享的文件系统快照(即镜像)来解决这一问题。尽管文献中常将 Docker 视为理论上可实现可复现性的工具,但其在实际应用中的保证程度与局限性仍未得到充分探究。本研究通过两种互补方法填补这一空白:首先,我们开展系统性文献综述,考察 Docker 在科学论述中如何被构建为可复现性工具,并识别出文献中记载的、能够实现可复现镜像构建的 Dockerfile 编写最佳实践;随后,我们对从 GitHub 工作流收集的 5298 个 Docker 构建进行大规模实证研究。通过重建这些镜像并将其结果与历史版本进行对比,我们评估了 Docker 镜像的实际可复现性,并对文献中识别的最佳实践的有效性进行了验证。