Assessing similarity in source code has gained significant attention in recent years due to its importance in software engineering tasks such as clone detection and code search and recommendation. This work presents a comparative analysis of unsupervised similarity measures for identifying source code clone detection. The goal is to overview the current state-of-the-art techniques, their strengths, and weaknesses. To do that, we compile the existing unsupervised strategies and evaluate their performance on a benchmark dataset to guide software engineers in selecting appropriate methods for their specific use cases. The source code of this study is available at https://github.com/jorge-martinez-gil/codesim
翻译:近年来,由于源代码相似度评估在软件工程任务(如克隆检测、代码搜索与推荐)中的重要性,该领域受到了广泛关注。本研究对用于识别源代码克隆检测的无监督相似度度量方法进行了比较分析,旨在概述当前最先进的技术及其优势与不足。为此,我们整理了现有的无监督策略,并在基准数据集上评估其性能,以指导软件工程师根据具体使用场景选择合适的方法。本研究的源代码可在 https://github.com/jorge-martinez-gil/codesim 获取。