Real-world software engineering tasks require coding agents that can operate on massive repositories, sustain long-horizon sessions, and reliably coordinate complex toolchains at test time. Existing research-grade coding agents offer transparency but struggle when scaled to heavier, production-level workloads, while production-grade systems achieve strong practical performance but provide limited extensibility, interpretability, and controllability. We introduce the Confucius Code Agent (CCA), a software engineering agent that can operate at large-scale codebases. CCA is built on top of the Confucius SDK, an agent development platform structured around three complementary perspectives: Agent Experience (AX), User Experience (UX), and Developer Experience (DX). The SDK supports a unified orchestrator with advanced context management for long-context reasoning, a persistent note-taking system for cross-session continual learning, and a modular extension system for reliable tool use. In addition, we introduce a meta-agent that automates the construction, evaluation, and refinement of agents through a build-test-improve cycle, enabling rapid agent development on new tasks and tool stacks. Instantiated on the Confucius SDK using the meta-agent, CCA demonstrates strong performance on real-world software engineering tasks. On SWE-Bench-Pro, CCA achieves a Resolve@1 of 59%, exceeding prior research baselines as well as commercial results, under identical repositories, model backends, and tool access.
翻译:真实世界的软件工程任务要求编码代理能够在海量代码库上运行、维持长时程会话,并在测试时可靠地协调复杂的工具链。现有的研究级编码代理虽具有透明性,但在扩展至更重负载的生产级任务时表现不佳;而生产级系统虽实现了强大的实际性能,却在可扩展性、可解释性与可控性方面存在局限。本文介绍孔子代码代理(CCA),一种能够在大型代码库上运行的软件工程代理。CCA构建于孔子软件开发套件之上,该代理开发平台围绕三个互补视角构建:代理体验(AX)、用户体验(UX)与开发者体验(DX)。该SDK支持一个具有高级上下文管理功能的统一编排器,用于长上下文推理;一个用于跨会话持续学习的持久化笔记系统;以及一个用于可靠工具使用的模块化扩展系统。此外,我们引入一个元代理,通过构建-测试-改进的循环自动完成代理的构建、评估与优化,从而支持在新任务与工具栈上快速开发代理。基于孔子SDK并通过元代理实例化的CCA在真实世界软件工程任务中展现出强大性能。在SWE-Bench-Pro基准测试中,在相同代码库、模型后端与工具访问条件下,CCA实现了59%的Resolve@1分数,超越了先前的研究基线及商业系统结果。