AI coding agents are reshaping software development through both autonomous and human-mediated pull requests (PRs). When developers use AI agents to generate code under their own accounts, code authorship attribution becomes critical for repository governance, research validity, and understanding modern development practices. We present the first study on fingerprinting AI coding agents, analyzing 33,580 PRs from five major agents (OpenAI Codex, GitHub Copilot, Devin, Cursor, Claude Code) to identify behavioral signatures. With 41 features spanning commit messages, PR structure, and code characteristics, we achieve 97.2% F1-score in multi-class agent identification. We uncover distinct fingerprints: Codex shows unique multiline commit patterns (67.5% feature importance), and Claude Code exhibits distinctive code structure (27.2% importance of conditional statements). These signatures reveal that AI coding tools produce detectable behavioral patterns, suggesting potential for identifying AI contributions in software repositories.
翻译:AI编码代理正通过自主和人工介入的拉取请求(PR)重塑软件开发。当开发者使用AI代理在其个人账户下生成代码时,代码作者归属对于仓库治理、研究有效性及理解现代开发实践至关重要。本研究首次对AI编码代理进行指纹识别分析,通过检视来自五大主流代理(OpenAI Codex、GitHub Copilot、Devin、Cursor、Claude Code)的33,580个PR,识别其行为特征。基于涵盖提交信息、PR结构和代码特性的41项特征,我们在多类别代理识别中实现了97.2%的F1分数。研究发现各代理具有显著指纹特征:Codex呈现独特的多行提交模式(特征重要性达67.5%),而Claude Code则表现出差异化的代码结构(条件语句特征重要性占27.2%)。这些特征表明AI编码工具会产生可检测的行为模式,为识别软件仓库中的AI贡献提供了潜在途径。