Artificial Intelligence (AI) techniques, especially Large Language Models (LLMs), have started gaining popularity among researchers and software developers for generating source code. However, LLMs have been shown to generate code with quality issues and also incurred copyright/licensing infringements. Therefore, detecting whether a piece of source code is written by humans or AI has become necessary. This study first presents an empirical analysis to investigate the effectiveness of the existing AI detection tools in detecting AI-generated code. The results show that they all perform poorly and lack sufficient generalizability to be practically deployed. Then, to improve the performance of AI-generated code detection, we propose a range of approaches, including fine-tuning the LLMs and machine learning-based classification with static code metrics or code embedding generated from Abstract Syntax Tree (AST). Our best model outperforms state-of-the-art AI-generated code detector (GPTSniffer) and achieves an F1 score of 82.55. We also conduct an ablation study on our best-performing model to investigate the impact of different source code features on its performance.
翻译:人工智能(AI)技术,特别是大语言模型(LLMs),已开始在研究人员和软件开发人员中流行,用于生成源代码。然而,研究表明LLMs生成的代码存在质量问题,并可能引发版权/许可侵权。因此,检测一段源代码是由人类编写还是由AI生成已变得十分必要。本研究首先进行了一项实证分析,以调查现有AI检测工具在识别AI生成代码方面的有效性。结果表明,这些工具均表现不佳,且缺乏足够的泛化能力,难以实际部署。随后,为提升AI生成代码的检测性能,我们提出了一系列方法,包括对大语言模型进行微调,以及基于静态代码度量或从抽象语法树(AST)生成的代码嵌入进行机器学习分类。我们表现最佳的模型超越了当前最先进的AI生成代码检测器(GPTSniffer),取得了82.55的F1分数。我们还对该最佳模型进行了消融研究,以探究不同源代码特征对其性能的影响。