Page Objects (POs) are a widely adopted design pattern for improving the maintainability and scalability of automated end-to-end web tests. However, creating and maintaining POs is still largely a manual, labor-intensive activity, while automated solutions have seen limited practical adoption. In this context, the potential of Large Language Models (LLMs) for these tasks has remained largely unexplored. This paper presents an empirical study on the feasibility of using LLMs, specifically GPT-4o and DeepSeek Coder, to automatically generate POs for web testing. We evaluate the generated artifacts on an existing benchmark of five web applications for which manually written POs are available (the ground truth), focusing on accuracy (i.e., the proportion of ground truth elements correctly identified) and element recognition rate (i.e., the proportion of ground truth elements correctly identified or marked for modification). Our results show that LLMs can generate syntactically correct and functionally useful POs with accuracy values ranging from 32.6% to 54.0% and element recognition rate exceeding 70% in most cases. Our study contributes the first systematic evaluation of LLMs strengths and open challenges for automated PO generation, and provides directions for further research on integrating LLMs into practical testing workflows.
翻译:页面对象(POs)是一种被广泛采用的设计模式,用于提升自动化端到端网页测试的可维护性与可扩展性。然而,POs的创建与维护在很大程度上仍是一项手动、劳动密集型的活动,而自动化解决方案的实际应用仍十分有限。在此背景下,大语言模型(LLMs)在这些任务中的潜力尚未得到充分探索。本文针对使用LLMs(特别是GPT-4o与DeepSeek Coder)自动生成网页测试POs的可行性开展了一项实证研究。我们在一个包含五个网页应用程序的现有基准上评估生成的产物,这些应用程序均具备手动编写的POs(作为基准真值),评估重点在于准确率(即正确识别的基准真值元素比例)与元素识别率(即正确识别或标记为需修改的基准真值元素比例)。我们的结果表明,LLMs能够生成语法正确且功能可用的POs,其准确率在32.6%至54.0%之间,且在大多数情况下元素识别率超过70%。本研究首次系统性地评估了LLMs在自动化PO生成方面的优势与开放挑战,并为将LLMs集成到实际测试工作流中的进一步研究提供了方向。