Impact of Combining Syntactic and Semantic Similarities on Patch Prioritization while using the Insertion Mutation Operators

Patch prioritization ranks candidate patches based on their likelihood of being correct. The fixing ingredients that are more likely to be the fix for a bug, share a high contextual similarity. A recent study shows that combining both syntactic and semantic similarity for capturing the contextual similarity, can do better in prioritizing patches. In this study, we evaluate the impact of combining the syntactic and semantic features on patch prioritization using the Insertion mutation operators. This study inspects the result of different combinations of syntactic and semantic features on patch prioritization. As a pilot study, the approach uses genealogical similarity to measure the semantic similarity and normalized longest common subsequence, normalized edit distance, cosine similarity, and Jaccard similarity index to capture the syntactic similarity. It also considers Anti-Pattern to filter out the incorrect plausible patches. The combination of both syntactic and semantic similarity can reduce the search space to a great extent. Also, the approach generates fixes for the bugs before the incorrect plausible one. We evaluate the techniques on the IntroClassJava benchmark using Insertion mutation operators and successfully generate fixes for 6 bugs before the incorrect plausible one. So, considering the previous study, the approach of combining syntactic and semantic similarity can able to solve a total number of 25 bugs from the benchmark, and to the best of our knowledge, it is the highest number of bugs solved than any other approach. The correctness of the generated fixes are further checked using the publicly available results of CapGen and thus for the generated fixes, the approach achieves a precision of 100%

翻译：补丁优先级排序根据候选补丁的正确可能性对其进行排序。更可能修复缺陷的修复成分共享较高的上下文相似性。最近一项研究表明，结合句法和语义相似度来捕捉上下文相似性，可以在补丁优先级排序中取得更好效果。在本研究中，我们评估了使用插入变异算子时，结合句法和语义特征对补丁优先级排序的影响。本研究考察了句法和语义特征的不同组合对补丁优先级排序的结果。作为一项初步研究，该方法使用谱系相似度来衡量语义相似度，并采用归一化最长公共子序列、归一化编辑距离、余弦相似度和Jaccard相似度指数来捕捉句法相似度。同时，还考虑了反模式以过滤出错误的合理补丁。句法和语义相似度的结合可以大幅减少搜索空间。此外，该方法能在错误的合理补丁之前生成缺陷修复方案。我们使用插入变异算子对IntroClassJava基准测试集进行了评估，并成功在错误的合理补丁之前为6个缺陷生成了修复方案。因此，结合前期研究，这种结合句法和语义相似度的方法能够从基准测试集中总共解决25个缺陷，据我们所知，这是比任何其他方法解决的缺陷数量最多的。生成的修复方案的正确性进一步通过公开可用的CapGen结果进行了验证，因此对于生成的修复方案，该方法达到了100%的精确率。