Copy & paste is a widespread practice when developing software and, thus, duplicated and subsequently modified code occurs frequently in software projects. Since such code clones, i.e., identical or similar fragments of code, can bloat software projects and cause issues like bug or vulnerability propagation, their identification is of importance. In this paper, we present StoneDetector and its underlying method for finding code clones in Java source and Bytecode. StoneDetector implements a conventional clone detection approach based upon the textual comparison of paths derived from the code's representation by dominator trees. In this way, the tool does not only find exact and syntactically similar near-miss code clones, but also code clones that are harder to detect due to their larger variety in the syntax. We demonstrate StoneDetector's versatility as a conventional clone detection tool and analyze its various available configuration parameters, including the usage of different string metrics, hashing algorithms, etc. In our exhaustive evaluation with other conventional clone detectors on several state-of-the-art benchmarks, we can show StoneDetector's performance and scalability in finding code clones in both, Java source and Bytecode.
翻译:复制粘贴是软件开发中的普遍实践,因此重复及后续修改的代码在软件项目中频繁出现。由于此类代码克隆(即相同或相似的代码片段)可能导致软件项目膨胀并引发缺陷或漏洞传播等问题,其识别具有重要意义。本文提出StoneDetector及其底层方法,用于检测Java源代码与字节码中的代码克隆。该工具基于支配树代码表示生成的路径文本比对,实现了传统克隆检测方法。通过这种方式,该工具不仅能发现精确克隆和语法相似的近似克隆,还能检测因语法差异较大而难以识别的代码克隆。我们通过分析不同字符串度量、哈希算法等可配置参数,展示了StoneDetector作为传统克隆检测工具的通用性。在多个前沿基准测试中与其他传统克隆检测工具进行的详尽评估表明,StoneDetector在Java源代码和字节码的克隆检测中均表现出优异的性能与可扩展性。