With the development of large language models, multiple AIs are now made available for code generation (such as ChatGPT and StarCoder) and are adopted widely. It is often desirable to know whether a piece of code is generated by AI, and furthermore, which AI is the author. For instance, if a certain version of AI is known to generate vulnerable code, it is particularly important to know the creator. Existing approaches are not satisfactory as watermarking codes are challenging compared with watermarking text data, as codes can be altered with relative ease via widely-used code refactoring methods. In this work, we propose ACW (AI Code Watermarking), a novel method for watermarking AI-generated codes. ACW is efficient as it requires no training or fine-tuning and works in a black-box manner. It is resilient as the watermark cannot be easily removed or tampered through common code refactoring methods. The key idea of ACW is to selectively apply a set of carefully-designed semantic-preserving, idempotent code transformations, whose presence (or absence) allows us to determine the existence of the watermark. Our experimental results show that ACW is effective (i.e., achieving high accuracy, true positive rates and false positive rates), resilient and efficient, significantly outperforming existing approaches.
翻译:随着大语言模型的发展,多种人工智能系统(如ChatGPT和StarCoder)已被广泛用于代码生成。在实际应用中,往往需要判定某段代码是否由AI生成,进而识别具体是哪个AI模型所为。例如,当某版本AI生成存在漏洞代码时,溯源至具体创作者尤为关键。现有方法难以令人满意,因为相较于文本数据水印,代码水印更具挑战性——代码极易通过广泛使用的代码重构方法被篡改。本研究提出ACW(AI代码水印),一种面向AI生成代码的新型水印方法。ACW无需训练或微调即可高效运行,且以黑盒方式工作。其鲁棒性体现在水印难以通过常规代码重构手段被移除或篡改。核心思想是选择性应用一组经精心设计的语义保持型幂等代码变换,通过检测这些变换的存在(或缺失)来判断水印。实验结果表明,ACW在有效性(高准确率、真阳率与假阳率)、鲁棒性和效率方面均显著优于现有方法。