Android Apps are frequently updated to keep up with changing user, hardware, and business demands. Ensuring the correctness of App updates through extensive testing is crucial to avoid potential bugs reaching the end user. Existing Android testing tools generate GUI events focussing on improving the test coverage of the entire App rather than prioritising updates and its impacted elements. Recent research has proposed change-focused testing but relies on random exploration to exercise the updates and impacted GUI elements that is ineffective and slow for large complex Apps with a huge input exploration space. We propose directed testing of App updates with Hawkeye that is able to prioritise executing GUI actions associated with code changes based on deep reinforcement learning from historical exploration data. Our empirical evaluation compares Hawkeye with state-of-the-art model-based and reinforcement learning-based testing tools FastBot2 and ARES using 10 popular open-source and 1 commercial App. We find that Hawkeye is able to generate GUI event sequences targeting changed functions more reliably than FastBot2 and ARES for the open source Apps and the large commercial App. Hawkeye achieves comparable performance on smaller open source Apps with a more tractable exploration space. The industrial deployment of Hawkeye in the development pipeline also shows that Hawkeye is ideal to perform smoke testing for merge requests of a complicated commercial App.
翻译:安卓应用为适应不断变化的用户、硬件和业务需求而频繁更新。通过充分测试确保应用更新的正确性至关重要,以防止潜在缺陷影响最终用户。现有安卓测试工具生成的GUI事件侧重于提升整个应用的测试覆盖率,而非优先覆盖更新及其影响元素。近期研究提出了面向变更的测试方法,但依赖随机探索来执行更新及受影响的GUI元素,对于输入探索空间庞大的复杂大型应用而言,该方法效率低下且效果不佳。我们提出基于Hawkeye的应用更新定向测试方法,通过从历史探索数据中基于深度强化学习,优先执行与代码变更关联的GUI操作。实证评估将Hawkeye与当前最先进的基于模型及基于强化学习的测试工具FastBot2和ARES进行对比,涵盖10个流行开源应用和1个商业应用。结果表明,对于开源应用及大型商业应用,Hawkeye比FastBot2和ARES更可靠地生成针对变更函数的GUI事件序列;在探索空间更易处理的小型开源应用中,Hawkeye取得可媲美的性能。Hawkeye在开发流水线中的工业级部署亦表明,其适合对复杂商业应用的合并请求执行冒烟测试。