Mobile acceptance testing remains a bottleneck in modern software development, particularly for cross-platform mobile development using frameworks like Flutter. While developers increasingly rely on automated testing tools, creating and maintaining acceptance test artifacts still demands significant manual effort. To help tackle this issue, we introduce AToMIC, an automated framework leveraging specialized Large Language Models to generate Gherkin scenarios, Page Objects, and executable UI test scripts directly from requirements (JIRA tickets) and recent code changes. Applied to BMW's MyBMW app, covering 13 real-world issues in a 170+ screen codebase, AToMIC produced executable test artifacts in under five minutes per feature on standard hardware. The generated artifacts were of high quality: 93.3% of Gherkin scenarios were syntactically correct upon generation, 78.8% of PageObjects ran without manual edits, and 100% of generated UI tests executed successfully. In a survey, all practitioners reported time savings (often a full developer-day per feature) and strong confidence in adopting the approach. These results confirm AToMIC as a scalable, practical solution for streamlining acceptance test creation and maintenance in industrial mobile projects.
翻译:移动验收测试仍是现代软件开发中的瓶颈环节,尤其在采用Flutter等框架的跨平台移动开发中。尽管开发者日益依赖自动化测试工具,但创建和维护验收测试工件仍需要大量人工投入。为应对这一问题,我们提出了AToMIC框架——一个利用专用大型语言模型,直接从需求(JIRA工单)和近期代码变更中生成Gherkin场景、页面对象及可执行UI测试脚本的自动化框架。该框架在宝马MyBMW应用(涵盖170余个屏幕的代码库中13个实际需求)的实践表明,AToMIC在标准硬件上平均每个功能可在五分钟内生成可执行测试工件。生成工件质量优异:93.3%的Gherkin场景在生成时即具备语法正确性,78.8%的页面对象无需人工修改即可运行,100%的生成UI测试均成功执行。实践者调研显示,所有参与者均报告了时间节省(通常每个功能可节省完整工作日),并对采用该方法具有强烈信心。这些结果证实AToMIC为工业级移动项目中验收测试创建与维护的流程优化提供了可扩展的实用解决方案。