Computing the differences between two versions of the same program is an essential task for software development and software evolution research. AST differencing is the most advanced way of doing so, and an active research area. Yet, AST differencing algorithms rely on configuration parameters that may have a strong impact on their effectiveness. In this paper, we present a novel approach named DAT (Diff Auto Tuning) for hyperparameter optimization of AST differencing. We thoroughly state the problem of hyper-configuration for AST differencing. We evaluate our data-driven approach DAT to optimize the edit-scripts generated by the state-of-the-art AST differencing algorithm named GumTree in different scenarios. DAT is able to find a new configuration for GumTree that improves the edit-scripts in 21.8% of the evaluated cases.
翻译:计算同一程序两个版本之间的差异是软件开发与软件演化研究中的关键任务。抽象语法树(AST)差异分析是实现该目标的最先进方法,也是一个活跃的研究领域。然而,AST差异算法依赖的配置参数可能对其有效性产生重大影响。本文提出了一种名为DAT(Diff Auto Tuning)的新方法,用于AST差异的超参数优化。我们系统阐述了AST差异的超配置问题,并通过在多种场景下优化当前最先进的AST差异算法GumTree所生成的编辑脚本,评估了这种数据驱动方法DAT的性能。DAT能够为GumTree找到一种新配置,在21.8%的评估案例中改进编辑脚本质量。