Background: Since Alitheia Core was proposed and subsequently retired, tools that support empirical studies of software projects continue to be proposed, such as Codeface, Codeface4Smells, GrimoireLab and SmartSHARK, but they all make different design choices and provide overlapping functionality. Aims: We seek to understand the design decisions adopted by these tools--the good and the bad--along with their consequences, to understand why their authors reinvented functionality already present in other tools, and to help inform the design of future tools. Method: We used action research to evaluate the tools, and to determine a set of principles and anti-patterns to motivate a new tool design. Results: We identified 7 major design choices among the tools: 1) Abstraction Debt, 2) the use of Project Configuration Files, 3) the choice of Batch or Interactive Mode, 4) Minimal Paths to Data, 5) Familiar Software Abstractions, 6) Licensing and 7) the Perils of Code Reuse. Building on the observed good and bad design decisions, we created our own tool architecture and implemented it as an R package. Conclusions: Tools should not require onerous setup for users to obtain data. Authors should consider the conventions and abstractions used by their chosen language and build upon these instead of redefining them. Tools should encourage best practices in experiment reproducibility by leveraging self-contained and readable schemas that are used for tool automation, and reuse must be done with care to avoid depending on dead code.
翻译:背景:自Alitheia Core被提出并随后停用以来,支持软件项目实证研究的工具不断涌现,如Codeface、Codeface4Smells、GrimoireLab和SmartSHARK,但这些工具采用了不同的设计选择,且功能存在重叠。目的:我们试图理解这些工具所采纳的设计决策(包括优点与缺点)及其产生的后果,探究开发者为何重复实现其他工具已有的功能,并为未来工具的设计提供参考。方法:我们采用行动研究法评估现有工具,并确定一组设计原则与反模式,以驱动新工具的设计。结果:我们识别出工具间的7大设计选择:1)抽象债务,2)项目配置文件的使用,3)批处理或交互模式的选择,4)数据的最小路径,5)熟悉的软件抽象,6)许可协议,7)代码复用的风险。基于观察到的优缺设计决策,我们构建了自身工具架构,并将其实现为R语言包。结论:工具不应要求用户进行繁琐的配置才能获取数据。开发者应考虑所选语言的约定与抽象机制,并在此基础上构建,而非重新定义。工具应通过采用自包含且可读的自动化模式,鼓励实验可复现的最佳实践,同时代码复用需谨慎,避免依赖废弃代码。