Mining repetitive code changes from version control history is a common way of discovering unknown change patterns. Such change patterns can be used in code recommender systems or automated program repair techniques. While there are such tools and datasets exist for Java, there is little work on finding and recommending such changes in Python. In this paper, we present a data set of manually vetted generalizable Python repetitive code change patterns. We create a coding guideline to identify generalizable change patterns that can be used in automated tooling. We leverage the mined change patterns from recent work that mines repetitive changes in Python projects and use our coding guideline to manually review the patterns. For each change, we also record a description of the change and why it is applied along with other characteristics such as the number of projects it occurs in. This review process allows us to identify and share 72 Python change patterns that can be used to build and advance Python developer support tools.
翻译:从版本控制历史中挖掘重复性代码变更是一种发现未知变更模式的常见方法。此类变更模式可应用于代码推荐系统或自动程序修复技术。虽然针对Java已存在相关工具和数据集,但在Python中对此类变更的发现与推荐研究仍较为缺乏。本文提出一个经人工核验的、可泛化的Python重复性代码变更模式数据集。我们制定了编码准则以识别可用于自动化工具的可泛化变更模式,并利用近期工作中从Python项目挖掘到的重复性变更模式,通过编码准则进行人工审查。针对每个变更,我们记录了变更描述、应用原因以及出现该模式的项目数量等特征。该审查过程最终识别并共享了72个Python变更模式,可用于构建和优化Python开发者辅助工具。