SWE-Spot: Building Small Repo-Experts with Repository-Centric Learning

The deployment of coding agents in privacy-sensitive and resource-constrained environments drives the demand for capable open-weight Small Language Models (SLMs). However, they suffer from a fundamental capability gap: unlike frontier large models, they lack the inference-time strong generalization to work with complicated, unfamiliar codebases. We identify that the prevailing Task-Centric Learning (TCL) paradigm, which scales exposure across disparate repositories, fails to address this limitation. In response, we propose Repository-Centric Learning (RCL), a paradigm shift that prioritizes vertical repository depth over horizontal task breadth, suggesting SLMs must internalize the "physics" of a target software environment through parametric knowledge acquisition, rather than attempting to recover it via costly inference-time search. Following this new paradigm, we design a four-unit Repository-Centric Experience, transforming static codebases into interactive learning signals, to train SWE-Spot-4B, a family of highly compact models built as repo-specialized experts that breaks established scaling trends, outperforming open-weight models up to larger (e.g., CWM by Meta, Qwen3-Coder-30B) and surpassing/matching efficiency-focused commercial models (e.g., GPT-4.1-mini, GPT-5-nano) across multiple SWE tasks. Further analysis reveals that RCL yields higher training sample efficiency and lower inference costs, emphasizing that for building efficient intelligence, repository mastery is a distinct and necessary dimension that complements general coding capability.

翻译：在隐私敏感和资源受限的环境中部署编码智能体，推动了对高性能开源小参数语言模型（SLMs）的需求。然而，这些模型存在一个根本性的能力缺陷：与前沿大模型不同，它们缺乏在推理时对复杂、陌生代码库进行强泛化的能力。我们发现，当前主流的任务中心化学习（TCL）范式，即通过广泛接触不同仓库来扩展经验，无法解决这一局限。为此，我们提出了仓库中心化学习（RCL）这一范式转变，它优先考虑对单个仓库的纵向深度探索，而非横向任务广度。该范式认为，小参数语言模型必须通过参数化知识获取来内化目标软件环境的“物理规律”，而非试图通过昂贵的推理时搜索来恢复这些信息。遵循这一新范式，我们设计了一个包含四个单元的仓库中心化经验模块，将静态代码库转化为交互式学习信号，并以此训练了SWE-Spot-4B模型系列。该系列是高度紧凑的仓库专业化专家模型，打破了既有的缩放规律，在多项软件工程任务上，其性能超越了参数规模更大的开源模型（例如Meta的CWM、Qwen3-Coder-30B），并达到或超越了注重效率的商业模型（例如GPT-4.1-mini、GPT-5-nano）。进一步分析表明，RCL范式带来了更高的训练样本效率和更低的推理成本，这强调了一个观点：对于构建高效智能体而言，精通特定仓库是一个独特且必要的维度，它与通用编码能力形成互补。