Assessing similarity in source code has gained significant attention in recent years due to its importance in software engineering tasks such as clone detection and code search and recommendation. This work presents a comparative analysis of unsupervised similarity measures for identifying source code clone detection. The goal is to overview the current state-of-the-art techniques, their strengths, and weaknesses. To do that, we compile the existing unsupervised strategies and evaluate their performance on a benchmark dataset to guide software engineers in selecting appropriate methods for their specific use cases. The source code of this study is available at \url{https://github.com/jorge-martinez-gil/codesim}
翻译:近年来,源代码相似度评估因在克隆检测、代码搜索与推荐等软件工程任务中的重要性而受到广泛关注。本文对用于识别源代码克隆检测的无监督相似度度量进行了比较分析,旨在概述当前最先进技术的优势与不足。为此,我们系统整理了现有无监督策略,并在基准数据集上评估其性能,以指导软件工程师根据具体应用场景选择合适方法。本研究的源代码可通过开源仓库获取:\url{https://github.com/jorge-martinez-gil/codesim}