Software testing activities scrutinize the artifacts and the behavior of a software product to find possible defects and ensure that the product meets its expected requirements. Recently, Deep Reinforcement Learning (DRL) has been successfully employed in complex testing tasks such as game testing, regression testing, and test case prioritization to automate the process and provide continuous adaptation. Practitioners can employ DRL by implementing from scratch a DRL algorithm or using a DRL framework. DRL frameworks offer well-maintained implemented state-of-the-art DRL algorithms to facilitate and speed up the development of DRL applications. Developers have widely used these frameworks to solve problems in various domains including software testing. However, to the best of our knowledge, there is no study that empirically evaluates the effectiveness and performance of implemented algorithms in DRL frameworks. Moreover, some guidelines are lacking from the literature that would help practitioners choose one DRL framework over another. In this paper, we empirically investigate the applications of carefully selected DRL algorithms on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing. For the game testing task, we conduct experiments on a simple game and use DRL algorithms to explore the game to detect bugs. Results show that some of the selected DRL frameworks such as Tensorforce outperform recent approaches in the literature. To prioritize test cases, we run experiments on a CI environment where DRL algorithms from different frameworks are used to rank the test cases. Our results show that the performance difference between implemented algorithms in some cases is considerable, motivating further investigation.
翻译:软件测试活动通过审查软件产品的工件和行为,发现潜在缺陷并确保产品满足预期需求。近年来,深度强化学习(DRL)已成功应用于复杂测试任务,如游戏测试、回归测试和测试用例优先级排序,以实现自动化流程并提供持续适应性。开发人员可通过从头实现DRL算法或使用DRL框架来应用这一技术。DRL框架提供经过良好维护、实现最新研究成果的DRL算法,旨在简化和加速DRL应用的开发。这些框架已被开发者广泛用于解决包括软件测试在内的多领域问题。然而,据我们所知,尚未有研究对DRL框架中实现算法的有效性和性能进行实证评估。此外,文献中也缺乏帮助开发者在不同DRL框架间进行选择的指导准则。本文通过实证研究,在两项重要软件测试任务中系统评估精心选取的DRL算法:持续集成(CI)场景下的测试用例优先级排序和游戏测试。针对游戏测试任务,我们在简易游戏上进行实验,利用DRL算法探索游戏环境以检测缺陷。结果表明,Tensorforce等部分选定的DRL框架性能优于文献近期方法。在测试用例优先级排序实验中,我们在CI环境中运行不同框架的DRL算法对测试用例进行排序。研究结果显示,各框架实现算法在某些场景下的性能差异显著,这值得进一步探究。