DARPA's AI Cyber Challenge (AIxCC, 2023--2025) is the largest competition to date for building fully autonomous cyber reasoning systems (CRSs) that leverage recent advances in AI -- particularly large language models (LLMs) -- to discover and remediate vulnerabilities in real-world open-source software. This paper presents the first systematic analysis of AIxCC. Drawing on design documents, source code, execution traces, and discussions with organizers and competing teams, we examine the competition's structure and key design decisions, characterize the architectural approaches of finalist CRSs, and analyze competition results beyond the final scoreboard. Our analysis reveals the factors that truly drove CRS performance, identifies genuine technical advances achieved by teams, and exposes limitations that remain open for future research. We conclude with lessons for organizing future competitions and broader insights toward deploying autonomous CRSs in practice.
翻译:DARPA人工智能网络挑战赛(AIxCC,2023–2025)是迄今为止规模最大的竞赛,旨在构建完全自主的网络推理系统(CRS),该系统利用人工智能——尤其是大语言模型(LLM)——的最新进展,来发现并修复现实世界开源软件中的漏洞。本文首次对AIxCC进行了系统性分析。基于设计文档、源代码、执行轨迹以及与组织者和参赛团队的讨论,我们审视了竞赛的结构与关键设计决策,描述了决赛入围CRS的架构方法,并分析了超越最终排行榜的竞赛结果。我们的分析揭示了真正驱动CRS性能的因素,识别了各团队实现的实际技术进步,并指出了未来研究中仍需解决的局限性。最后,我们总结了组织未来竞赛的经验,并为在实践中部署自主CRS提供了更广泛的见解。