Fingerprinting AI Coding Agents on GitHub

AI coding agents are reshaping software development through both autonomous and human-mediated pull requests (PRs). When developers use AI agents to generate code under their own accounts, code authorship attribution becomes critical for repository governance, research validity, and understanding modern development practices. We present the first study on fingerprinting AI coding agents, analyzing 33,580 PRs from five major agents (OpenAI Codex, GitHub Copilot, Devin, Cursor, Claude Code) to identify behavioral signatures. With 41 features spanning commit messages, PR structure, and code characteristics, we achieve 97.2% F1-score in multi-class agent identification. We uncover distinct fingerprints: Codex shows unique multiline commit patterns (67.5% feature importance), and Claude Code exhibits distinctive code structure (27.2% importance of conditional statements). These signatures reveal that AI coding tools produce detectable behavioral patterns, suggesting potential for identifying AI contributions in software repositories.

翻译：AI编码代理正通过自主和人工介入的拉取请求（PR）重塑软件开发。当开发者使用AI代理在其个人账户下生成代码时，代码作者归属对于仓库治理、研究有效性及理解现代开发实践至关重要。本研究首次对AI编码代理进行指纹识别分析，通过检视来自五大主流代理（OpenAI Codex、GitHub Copilot、Devin、Cursor、Claude Code）的33,580个PR，识别其行为特征。基于涵盖提交信息、PR结构和代码特性的41项特征，我们在多类别代理识别中实现了97.2%的F1分数。研究发现各代理具有显著指纹特征：Codex呈现独特的多行提交模式（特征重要性达67.5%），而Claude Code则表现出差异化的代码结构（条件语句特征重要性占27.2%）。这些特征表明AI编码工具会产生可检测的行为模式，为识别软件仓库中的AI贡献提供了潜在途径。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

AI生成代码缺陷综述

专知会员服务

17+阅读 · 2025年12月8日

Al Agent：AI时代的软件革命

专知会员服务

48+阅读 · 2025年5月13日

DeepSeek系列报告：AI编程或为B端最先崛起的AI应用

专知会员服务

73+阅读 · 2025年2月15日

《深度学习代码智能》综述、基准和工具集

专知会员服务

56+阅读 · 2024年1月2日