编程代理是否生成了过度模拟的测试？一项实证研究 (Are Coding Agents Generating Over-Mocked Tests? An Empirical Study)

Coding agents have received significant adoption in software development recently. Unlike traditional LLM-based code completion tools, coding agents work with autonomy (e.g., invoking external tools) and leave visible traces in software repositories, such as authoring commits. Among their tasks, coding agents may autonomously generate software tests; however, the quality of these tests remains uncertain. In particular, excessive use of mocking can make tests harder to understand and maintain. This paper presents the first study to investigate the presence of mocks in agent-generated tests of real-world software systems. We analyzed over 1.2 million commits made in 2025 in 2,168 TypeScript, JavaScript, and Python repositories, including 48,563 commits by coding agents, 169,361 commits that modify tests, and 44,900 commits that add mocks to tests. Overall, we find that coding agents are more likely to modify tests and to add mocks to tests than non-coding agents. We detect that (1) 60% of the repositories with agent activity also contain agent test activity; (2) 23% of commits made by coding agents add/change test files, compared with 13% by non-agents; (3) 68% of the repositories with agent test activity also contain agent mock activity; (4) 36% of commits made by coding agents add mocks to tests, compared with 26% by non-agents; and (5) repositories created recently contain a higher proportion of test and mock commits made by agents. Finally, we conclude by discussing implications for developers and researchers. We call attention to the fact that tests with mocks may be potentially easier to generate automatically (but less effective at validating real interactions), and the need to include guidance on mocking practices in agent configuration files.

翻译：近年来，编程代理在软件开发中得到了广泛采用。与传统的基于大语言模型的代码补全工具不同，编程代理具有自主性（例如，调用外部工具），并在软件仓库中留下可见的痕迹，例如提交代码。在其任务中，编程代理可能会自主生成软件测试；然而，这些测试的质量仍不确定。特别是，过度使用模拟技术可能使测试更难以理解和维护。本文首次研究了现实世界软件系统中由代理生成的测试中模拟技术的使用情况。我们分析了2025年在2,168个TypeScript、JavaScript和Python仓库中提交的超过120万次提交，其中包括48,563次由编程代理提交的提交、169,361次修改测试的提交以及44,900次向测试中添加模拟的提交。总体而言，我们发现与非编程代理相比，编程代理更倾向于修改测试并向测试中添加模拟。我们检测到：(1) 60%存在代理活动的仓库也包含代理测试活动；(2) 23%由编程代理提交的提交添加/更改了测试文件，而非代理提交的这一比例为13%；(3) 68%存在代理测试活动的仓库也包含代理模拟活动；(4) 36%由编程代理提交的提交向测试中添加了模拟，而非代理提交的这一比例为26%；(5) 近期创建的仓库中包含更高比例的由代理提交的测试和模拟提交。最后，我们讨论了这对开发者和研究人员的启示。我们提请关注以下事实：包含模拟的测试可能更容易自动生成（但在验证真实交互方面效果较差），并且需要在代理配置文件中包含关于模拟实践的指导。