An Empirical Study of Token-based Micro Commits

In software development, developers frequently apply maintenance activities to the source code that change a few lines by a single commit. A good understanding of the characteristics of such small changes can support quality assurance approaches (e.g., automated program repair), as it is likely that small changes are addressing deficiencies in other changes; thus, understanding the reasons for creating small changes can help understand the types of errors introduced. Eventually, these reasons and the types of errors can be used to enhance quality assurance approaches for improving code quality. While prior studies used code churns to characterize and investigate the small changes, such a definition has a critical limitation. Specifically, it loses the information of changed tokens in a line. For example, this definition fails to distinguish the following two one-line changes: (1) changing a string literal to fix a displayed message and (2) changing a function call and adding a new parameter. These are definitely maintenance activities, but we deduce that researchers and practitioners are interested in supporting the latter change. To address this limitation, in this paper, we define micro commits, a type of small change based on changed tokens. Our goal is to quantify small changes using changed tokens. Changed tokens allow us to identify small changes more precisely. In fact, this token-level definition can distinguish the above example. We investigate defined micro commits in four OSS projects and understand their characteristics as the first empirical study on token-based micro commits. We find that micro commits mainly replace a single name or literal token, and micro commits are more likely used to fix bugs. Additionally, we propose the use of token-based information to support software engineering approaches in which very small changes significantly affect their effectiveness.

翻译：在软件开发过程中，开发者频繁通过单次提交对源代码实施仅修改数行的维护活动。深入理解此类小型变更的特征可支撑质量保证方法（如自动程序修复），因为小型变更很可能旨在修正其他变更中的缺陷。因此，探究创建小型变更的动因有助于理解错误引入的类型，这些动因与错误类型最终可用于增强提升代码质量的质量保证方法。尽管以往研究采用代码变更量（code churns）来表征和探究小型变更，但这种定义存在关键缺陷——它丢失了代码行中变更词令的信息。例如，该定义无法区分以下两种单行变更：(1) 修改字符串字面量以修正显示消息；(2) 更改函数调用并新增参数。这两者虽同属维护活动，但我们推断研究人员和从业者更关注后者的支持需求。为突破这一局限，本文基于变更词令定义了"微提交"这一新型小型变更类型，旨在通过变更词令量化小型变更。词令级变更能更精准识别小型变更，事实上，该词令级定义可区分上述示例。我们针对四个开源软件项目进行微提交特性探究，作为首个基于词令的微提交实证研究。研究发现：微提交主要涉及单一标识符或字面量词令的替换，且更常用于修复缺陷。此外，我们提出利用词令级信息支撑对极小变更高度敏感的软件工程方法。

相关内容

MICRO

关注 1

MICRO：IEEE/ACM International Symposium on Microarchitecture Explanation：IEEE/ACM微体系结构国际研讨会。 Publisher：IEEE/ACM。 SIT:https://dblp.uni-trier.de/db/conf/micro/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日