Building the MSR Tool Kaiaulu: Design Principles and Experiences

from arxiv, This preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections. The Version of Record of this contribution is published in 13365, and is available online at https://doi.org/10.1007/978-3-031-15116-3_6

Background: Since Alitheia Core was proposed and subsequently retired, tools that support empirical studies of software projects continue to be proposed, such as Codeface, Codeface4Smells, GrimoireLab and SmartSHARK, but they all make different design choices and provide overlapping functionality. Aims: We seek to understand the design decisions adopted by these tools--the good and the bad--along with their consequences, to understand why their authors reinvented functionality already present in other tools, and to help inform the design of future tools. Method: We used action research to evaluate the tools, and to determine a set of principles and anti-patterns to motivate a new tool design. Results: We identified 7 major design choices among the tools: 1) Abstraction Debt, 2) the use of Project Configuration Files, 3) the choice of Batch or Interactive Mode, 4) Minimal Paths to Data, 5) Familiar Software Abstractions, 6) Licensing and 7) the Perils of Code Reuse. Building on the observed good and bad design decisions, we created our own tool architecture and implemented it as an R package. Conclusions: Tools should not require onerous setup for users to obtain data. Authors should consider the conventions and abstractions used by their chosen language and build upon these instead of redefining them. Tools should encourage best practices in experiment reproducibility by leveraging self-contained and readable schemas that are used for tool automation, and reuse must be done with care to avoid depending on dead code.

翻译：背景：自Alitheia Core被提出并随后停用以来，支持软件项目实证研究的工具不断涌现，如Codeface、Codeface4Smells、GrimoireLab和SmartSHARK，但这些工具采用了不同的设计选择，且功能存在重叠。目的：我们试图理解这些工具所采纳的设计决策（包括优点与缺点）及其产生的后果，探究开发者为何重复实现其他工具已有的功能，并为未来工具的设计提供参考。方法：我们采用行动研究法评估现有工具，并确定一组设计原则与反模式，以驱动新工具的设计。结果：我们识别出工具间的7大设计选择：1）抽象债务，2）项目配置文件的使用，3）批处理或交互模式的选择，4）数据的最小路径，5）熟悉的软件抽象，6）许可协议，7）代码复用的风险。基于观察到的优缺设计决策，我们构建了自身工具架构，并将其实现为R语言包。结论：工具不应要求用户进行繁琐的配置才能获取数据。开发者应考虑所选语言的约定与抽象机制，并在此基础上构建，而非重新定义。工具应通过采用自包含且可读的自动化模式，鼓励实验可复现的最佳实践，同时代码复用需谨慎，避免依赖废弃代码。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

专知会员服务

49+阅读 · 2022年2月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日