This paper revisits recent code similarity evaluation metrics, particularly focusing on the application of Abstract Syntax Tree (AST) editing distance in diverse programming languages. In particular, we explore the usefulness of these metrics and compare them to traditional sequence similarity metrics. Our experiments showcase the effectiveness of AST editing distance in capturing intricate code structures, revealing a high correlation with established metrics. Furthermore, we explore the strengths and weaknesses of AST editing distance and prompt-based GPT similarity scores in comparison to BLEU score, execution match, and Jaccard Similarity. We propose, optimize, and publish an adaptable metric that demonstrates effectiveness across all tested languages, representing an enhanced version of Tree Similarity of Edit Distance (TSED).
翻译:本文重新审视了近期代码相似性评估指标,重点探讨了抽象语法树(AST)编辑距离在多种编程语言中的应用。我们深入分析了这些指标的有效性,并将其与传统序列相似性度量方法进行对比。实验结果表明,AST编辑距离在捕捉复杂代码结构方面表现优异,与现有成熟指标呈现高度相关性。此外,我们系统比较了AST编辑距离与基于提示的GPT相似度评分相较于BLEU分数、执行匹配度和Jaccard相似度的优势与局限。本研究提出、优化并发布了一种适用于所有测试语言的适应性度量方法,该方法作为编辑距离树相似度(TSED)的增强版本,展现出卓越的跨语言评估性能。