LLM-based Unit Test Generation via Property Retrieval

Automated unit test generation has been widely studied, with Large Language Models (LLMs) recently showing significant potential. Moreover, in the context of unit test generation, these tools prioritize high code coverage, often at the expense of practical usability, correctness, and maintainability. In response, we propose Property-Based Retrieval Augmentation, a novel mechanism that extends LLM-based Retrieval-Augmented Generation (RAG) beyond basic vector, text similarity, and graph-based methods. Our approach considers task-specific context and introduces a tailored property retrieval mechanism. Specifically, in the unit test generation task, we account for the unique structure of unit tests by dividing the test generation process into Given, When, and Then phases. When generating tests for a focal method, we not only retrieve general context for the code under test but also consider task-specific context such as pre-existing tests of other methods, which can provide valuable insights for any of the Given, When, and Then phases. This forms property relationships between focal method and other methods, thereby expanding the scope of retrieval beyond traditional RAG. We implement this approach in a tool called APT, which sequentially performs preprocessing, property retrieval, and unit test generation, using an iterative strategy where newly generated tests guide the creation of subsequent ones. We evaluated APT on 12 open-source projects with 1515 methods, and the results demonstrate that APT consistently outperforms existing tools in terms of correctness, completeness, and maintainability of the generated tests. Moreover, we introduce a novel code-context-aware retrieval mechanism for LLMs beyond general context, offering valuable insights and potential applications for other code-related tasks.

翻译：自动化单元测试生成已得到广泛研究，大型语言模型（LLM）近期展现出显著潜力。然而，现有单元测试生成工具往往过度追求高代码覆盖率，而牺牲了实际可用性、正确性与可维护性。为此，我们提出基于属性的检索增强机制，这是一种超越传统向量检索、文本相似度匹配和图检索方法的新型LLM检索增强生成（RAG）框架。该方法通过引入任务特定上下文与定制化属性检索机制实现创新。具体而言，在单元测试生成任务中，我们依据单元测试特有的结构将生成过程划分为Given、When和Then三个阶段。当为目标方法生成测试时，系统不仅检索被测代码的通用上下文，同时考虑任务特定上下文（例如其他方法的既有测试用例），这些信息可为任一生成阶段提供关键参考。由此在目标方法与其他方法间构建属性关联关系，从而突破传统RAG的检索范围限制。我们将该方法实现为名为APT的工具，其依次执行预处理、属性检索与单元测试生成流程，并采用迭代策略使新生成的测试指导后续测试创建。在涵盖12个开源项目共1515个方法的评估中，APT在生成测试的正确性、完整性与可维护性方面均持续优于现有工具。此外，本研究提出的代码上下文感知检索机制为LLM提供了超越通用上下文的检索能力，对其他代码相关任务具有重要参考价值与应用潜力。