AI队友时代的安全问题：GitHub上自主代码代理拉取请求的实证研究 (Security in the Age of AI Teammates: An Empirical Study of Agentic Pull Requests on GitHub)

Autonomous coding agents are increasingly deployed as AI teammates in modern software engineering, independently authoring pull requests (PRs) that modify production code at scale. This study aims to systematically characterize how autonomous coding agents contribute to software security in practice, how these security-related contributions are reviewed and accepted, and which observable signals are associated with PR rejection. We conduct a large-scale empirical analysis of agent-authored PRs using the AIDev dataset, comprising of over 33,000 curated PRs from popular GitHub repositories. Security-relevant PRs are identified using a keyword filtering strategy, followed by manual validation, resulting in 1,293 confirmed security-related agentic-PRs. We then analyze prevalence, acceptance outcomes, and review latency across autonomous agents, programming ecosystems, and types of code changes. Moreover, we apply qualitative open coding to identify recurring security-related actions and underlying intents, and examine review metadata to identify early signals associated with PR rejection. Security-related Agentic-PRs constitute a meaningful share of agent activity (approximately 4\%). Rather than focusing solely on narrow vulnerability fixes, agents most frequently perform supportive security hardening activities, including testing, documentation, configuration, and improved error handling. Compared to non-security PRs, security-related Agentic-PRs exhibit lower merge rates and longer review latency, reflecting heightened human scrutiny, with variation across agents and programming ecosystems. PR rejection is more strongly associated with PR complexity and verbosity than with explicit security topics.

翻译：自主代码代理作为AI队友在现代软件工程中日益普及，它们能够独立撰写拉取请求（PRs），大规模修改生产代码。本研究旨在系统性地刻画自主代码代理在实践中如何影响软件安全、这些与安全相关的贡献如何被审查和接受，以及哪些可观测信号与PR拒绝相关。我们利用AIDev数据集对代理撰写的PRs进行了大规模实证分析，该数据集包含来自热门GitHub仓库的超过33,000个精选PRs。通过关键词过滤策略识别安全相关PRs，并辅以人工验证，最终确认了1,293个与安全相关的自主代理PRs。随后，我们分析了不同自主代理、编程生态系统和代码变更类型中的普遍性、接受结果和审查延迟。此外，我们应用定性开放编码方法识别反复出现的安全相关操作及其潜在意图，并审查元数据以发现与PR拒绝相关的早期信号。安全相关的自主代理PRs在代理活动中占有显著比例（约4%）。代理最常执行的是支持性的安全加固活动，包括测试、文档编写、配置优化和改进错误处理，而非仅仅专注于狭窄的漏洞修复。与非安全PRs相比，安全相关的自主代理PRs表现出较低的合并率和较长的审查延迟，这反映了人类审查的加强，且不同代理和编程生态系统间存在差异。PR拒绝与PR复杂性和冗长度之间的关联，比与明确安全主题的关联更为显著。