Plagiarism detection in programming education faces growing challenges due to increasingly sophisticated obfuscation techniques, particularly automated refactoring-based attacks. While code plagiarism detection systems used in education practice are resilient against basic obfuscation, they struggle against structural modifications that preserve program behavior, especially caused by refactoring-based obfuscation. This paper presents a novel and extensible framework that enhances state-of-the-art detectors by leveraging code property graphs and graph transformations to counteract refactoring-based obfuscation. Our comprehensive evaluation of real-world student submissions, obfuscated using both algorithmic and AI-based obfuscation attacks, demonstrates a significant improvement in detecting plagiarized code.
翻译:编程教育中的抄袭检测正面临日益严峻的挑战,这主要源于日益复杂的代码混淆技术,尤其是基于自动化重构的攻击。尽管当前教育实践中使用的代码抄袭检测系统能够抵御基础混淆手段,但在应对保持程序行为不变的结构性修改——特别是由基于重构的混淆所引发的修改时,仍显不足。本文提出了一种新颖且可扩展的框架,该框架通过利用代码属性图和图变换技术,增强了现有先进检测器以对抗基于重构的混淆。我们对真实世界学生提交的代码进行了全面评估,这些代码同时使用了算法型和基于人工智能的混淆攻击进行处理,结果表明,该框架在检测抄袭代码方面取得了显著提升。