Frequent modifications of unit test cases are inevitable due to software's continuous underlying changes in source code, design, and requirements. Since manually maintaining software test suites is tedious, timely, and costly, automating the process of generation and maintenance of test units will significantly impact the effectiveness and efficiency of software testing processes. To this end, we propose an automated approach which exploits both structural and semantic properties of source code methods and test cases to recommend the most relevant and useful unit tests to the developers. The proposed approach initially trains a neural network to transform method-level source code, as well as unit tests, into distributed representations (embedded vectors) while preserving the importance of the structure in the code. Retrieving the semantic and structural properties of a given method, the approach computes cosine similarity between the method's embedding and the previously-embedded training instances. Further, according to the similarity scores between the embedding vectors, the model identifies the closest methods of embedding and the associated unit tests as the most similar recommendations. The results on the Methods2Test dataset showed that, while there is no guarantee to have similar relevant test cases for the group of similar methods, the proposed approach extracts the most similar existing test cases for a given method in the dataset, and evaluations show that recommended test cases decrease the developers' effort to generating expected test cases.
翻译:由于软件在源代码、设计和需求方面持续发生底层变化,对单元测试用例的频繁修改变得不可避免。由于手动维护软件测试套件繁琐、耗时且成本高昂,自动化测试单元的生成与维护过程将显著影响软件测试流程的有效性和效率。为此,我们提出一种自动化方法,该方法利用源代码方法和测试用例的结构与语义属性,向开发者推荐最相关且有用的单元测试。所提方法首先训练一个神经网络,将方法级源代码及单元测试转换为分布式表示(嵌入向量),同时保留代码中结构的重要性。通过提取给定方法的语义与结构属性,该方法计算该方法嵌入与先前嵌入的训练实例之间的余弦相似度。进一步,根据嵌入向量间的相似度得分,模型识别出最接近的嵌入方法及其关联的单元测试,作为最相似的推荐。在Methods2Test数据集上的实验结果表明,尽管无法保证相似方法组具有类似的关联测试用例,但所提方法能从数据集中为给定方法提取最相似的现有测试用例,评估显示推荐的测试用例减少了开发人员生成预期测试用例的工作量。