Build scripts automate the process of compiling source code, managing dependencies, running tests, and packaging software into deployable artifacts. These scripts are ubiquitous in modern software development pipelines for streamlining testing and delivery. While developing build scripts, practitioners may inadvertently introduce code smells, which are recurring patterns of poor coding practices that may lead to build failures or increase risk and technical debt. The goal of this study is to aid practitioners in avoiding code smells in build scripts through an empirical study of build scripts and issues on GitHub.We employed a mixed-methods approach, combining qualitative and quantitative analysis. First, we conducted a qualitative analysis of 2000 build-script-related GitHub issues to understand recurring smells. Next, we developed a static analysis tool, Sniffer, to automatically detect code smells in 5882 build scripts of Maven, Gradle, CMake, and Make files, collected from 4877 open-source GitHub repositories. To assess Sniffer's performance, we conducted a user study, where Sniffer achieved higher precision, recall, and F-score. We identified 13 code smell categories, with a total of 10,895 smell occurrences, where 3184 were in Maven, 1214 in Gradle, 337 in CMake, and 6160 in Makefiles. Our analysis revealed that Insecure URLs were the most prevalent code smell in Maven build scripts, while HardcodedPaths/URLs were commonly observed in both Gradle and CMake scripts. Wildcard Usage emerged as the most frequent smell in Makefiles. The co-occurrence analysis revealed strong associations between specific smell pairs of Hardcoded Paths/URLs with Duplicates, and Inconsistent Dependency Management with Empty or Incomplete Tags, which indicate potential underlying issues in the build script structure and maintenance practices.
翻译:构建脚本自动化了编译源代码、管理依赖、运行测试以及将软件打包为可部署产物的过程。这些脚本在现代软件开发流水线中无处不在,用于简化和加速测试与交付。在开发构建脚本时,实践者可能会无意中引入代码异味,即反复出现的编码不良模式,这些模式可能导致构建失败或增加风险与技术债务。本研究的目标是通过对GitHub上构建脚本及相关问题的实证研究,帮助实践者避免构建脚本中的代码异味。我们采用了混合方法,结合了定性与定量分析。首先,我们对2000个与构建脚本相关的GitHub问题进行了定性分析,以理解反复出现的异味模式。接着,我们开发了一个静态分析工具Sniffer,用于自动检测从4877个开源GitHub仓库中收集的5882个Maven、Gradle、CMake和Make构建脚本中的代码异味。为评估Sniffer的性能,我们进行了用户研究,结果显示Sniffer在精确率、召回率和F分数上均表现更优。我们识别出13个代码异味类别,总计10,895次异味出现,其中Maven脚本中3184次,Gradle脚本中1214次,CMake脚本中337次,Makefile中6160次。分析表明,Insecure URLs是Maven构建脚本中最普遍的代码异味,而HardcodedPaths/URLs在Gradle和CMake脚本中均常见。Wildcard Usage则是Makefile中最频繁出现的异味。共现分析揭示了特定异味对之间的强关联,例如Hardcoded Paths/URLs与Duplicates,以及Inconsistent Dependency Management与Empty or Incomplete Tags,这些关联暗示了构建脚本结构和维护实践中潜在的深层问题。