Revisiting and Improving Retrieval-Augmented Deep Assertion Generation

Unit testing validates the correctness of the unit under test and has become an essential activity in software development process. A unit test consists of a test prefix that drives the unit under test into a particular state, and a test oracle (e.g., assertion), which specifies the behavior in that state. To reduce manual efforts in conducting unit testing, Yu et al. proposed an integrated approach (integration for short), combining information retrieval (IR) with a deep learning-based approach, to generate assertions for a unit test. Despite promising, there is still a knowledge gap as to why or where integration works or does not work. In this paper, we describe an in-depth analysis of the effectiveness of integration. Our analysis shows that: 1) The overall performance of integration is mainly due to its success in retrieving assertions. 2) integration struggles to understand the semantic differences between the retrieved focal-test (focal-test includes a test prefix and a unit under test) and the input focal-test; 3) integration is limited to specific types of edit operations and cannot handle token addition or deletion. To improve the effectiveness of assertion generation, this paper proposes a novel retrieve-and-edit approach named EditAS. Specifically, EditAS first retrieves a similar focal-test from a pre-defined corpus and treats its assertion as a prototype. Then, EditAS reuses the information in the prototype and edits the prototype automatically. EditAS is more generalizable than integration. We conduct experiments on two large-scale datasets and experimental results demonstrate that EditAS outperforms the state-of-the-art approaches, with an average improvement of 10.00%-87.48% and 3.30%-42.65% in accuracy and BLEU score, respectively.

翻译：单元测试用于验证被测单元的正确性，已成为软件开发过程中的关键活动。一个单元测试由驱动被测单元进入特定状态的测试前缀和描述该状态行为的测试预言（如断言）组成。为减少单元测试中的人工工作量，Yu等人提出了一种结合信息检索（IR）与深度学习的集成方法（简称集成），用于为单元测试生成断言。尽管该方法展现出潜力，但关于集成方法为何有效或失效仍存在知识鸿沟。本文深入分析了集成方法的有效性，揭示：1）集成的整体性能主要归功于其检索断言的成功率；2）集成方法难以理解检索到的焦点测试（包括测试前缀与被测单元）与输入焦点测试之间的语义差异；3）集成方法局限于特定类型的编辑操作，无法处理词元添加或删除。为提升断言生成的有效性，本文提出一种新颖的检索-编辑方法EditAS。具体而言，EditAS首先从预定义语料库中检索相似焦点测试，并将其断言作为原型；随后，EditAS复用原型中的信息并自动编辑该原型。EditAS比集成方法更具泛化能力。我们在两个大规模数据集上开展实验，结果表明EditAS在准确率和BLEU分数上分别平均提升10.00%-87.48%和3.30%-42.65%，优于现有最先进方法。