Large language models (LLMs) have demonstrated the promise to revolutionize the field of software engineering. Among other things, LLM agents are rapidly gaining momentum in software development, with practitioners reporting a multifold increase in productivity after adoption. Yet, empirical evidence is lacking around these claims. In this paper, we estimate the causal effect of adopting a widely popular LLM agent assistant, namely Cursor, on development velocity and software quality. The estimation is enabled by a state-of-the-art difference-in-differences design comparing Cursor-adopting GitHub projects with a matched control group of similar GitHub projects that do not use Cursor. We find that the adoption of Cursor leads to a statistically significant, large, but transient increase in project-level development velocity, along with a substantial and persistent increase in static analysis warnings and code complexity. Further panel generalized-method-of-moments estimation reveals that increases in static analysis warnings and code complexity are major factors driving long-term velocity slowdown. Our study identifies quality assurance as a major bottleneck for early Cursor adopters and calls for it to be a first-class citizen in the design of agentic AI coding tools and AI-driven workflows.
翻译:大型语言模型(LLM)已展现出革新软件工程领域的潜力。其中,LLM智能体在软件开发中的应用正迅速普及,实践者报告称采用后生产力得到数倍提升。然而,围绕这些主张尚缺乏实证依据。本文通过前沿的双重差分设计,比较采用Cursor的GitHub项目与未使用Cursor的匹配对照组项目,估算了这一广受欢迎的LLM智能体助手——Cursor——对开发速度与软件质量的因果效应。研究发现:采用Cursor会导致项目级开发速度出现统计学上显著、幅度较大但短暂的增长,同时静态分析警告与代码复杂度出现持续且显著的上升。进一步的广义矩估计面板模型分析表明,静态分析警告与代码复杂度的增加是导致长期开发速度减缓的主要因素。本研究指出质量保证是早期Cursor采用者的主要瓶颈,并呼吁在智能体化AI编程工具与AI驱动工作流的设计中,将质量保证视为首要考量要素。