Improving web element localization by using a large language model

Web-based test automation heavily relies on accurately finding web elements. Traditional methods compare attributes but don't grasp the context and meaning of elements and words. The emergence of Large Language Models (LLMs) like GPT-4, which can show human-like reasoning abilities on some tasks, offers new opportunities for software engineering and web element localization. This paper introduces and evaluates VON Similo LLM, an enhanced web element localization approach. Using an LLM, it selects the most likely web element from the top-ranked ones identified by the existing VON Similo method, ideally aiming to get closer to human-like selection accuracy. An experimental study was conducted using 804 web element pairs from 48 real-world web applications. We measured the number of correctly identified elements as well as the execution times, comparing the effectiveness and efficiency of VON Similo LLM against the baseline algorithm. In addition, motivations from the LLM were recorded and analyzed for all instances where the original approach failed to find the right web element. VON Similo LLM demonstrated improved performance, reducing failed localizations from 70 to 39 (out of 804), a 44 percent reduction. Despite its slower execution time and additional costs of using the GPT-4 model, the LLMs human-like reasoning showed promise in enhancing web element localization. LLM technology can enhance web element identification in GUI test automation, reducing false positives and potentially lowering maintenance costs. However, further research is necessary to fully understand LLMs capabilities, limitations, and practical use in GUI testing.

翻译：基于Web的测试自动化在很大程度上依赖于精准定位Web元素。传统方法通过比较属性来工作，但未能把握元素和词语的上下文及含义。像GPT-4这样能在某些任务上展现类人推理能力的大型语言模型（LLMs）的出现，为软件工程和Web元素定位提供了新的机遇。本文介绍并评估了VON Similo LLM——一种增强型的Web元素定位方法。该方法利用LLM，从现有VON Similo方法识别出的排名靠前的元素中选择最可能的Web元素，旨在更接近人类的选择准确性。我们使用来自48个真实Web应用的804对Web元素进行了实验研究。我们测量了正确识别的元素数量以及执行时间，将VON Similo LLM的有效性和效率与基线算法进行了比较。此外，针对原始方法未能找到正确Web元素的所有实例，记录并分析了来自LLM的动机。VON Similo LLM表现出了性能提升，将定位失败次数从70次减少到39次（共804次），减少了44%。尽管其执行时间较慢且使用GPT-4模型增加了额外成本，但LLM的类人推理能力在增强Web元素定位方面展现出了潜力。LLM技术可以增强GUI测试自动化中的Web元素识别，减少误报，并可能降低维护成本。然而，需要进一步的研究来全面了解LLM在GUI测试中的能力、局限性以及实际用途。