Usability describes quality attributes of application user interfaces that determine how effectively users can interact with them. Traditional usability evaluation methods require considerable expertise and resources, which can be challenging, especially for small teams and organizations. Automating usability evaluation could make it more accessible and help to improve the user experience. The recent emergence of powerful multimodal large language models (MLLMs) has opened new opportunities for automating usability evaluation and recommendation of improvements. These models can process visual inputs such as images and videos alongside textual context, which enables the identification of usability issues and the generation of actionable suggestions to resolve these issues. In this paper, we present a novel automated approach that uses limited application context and screen recordings of user interactions as input to an MLLM. The model automatically identifies and describes usability issues based on Nielsens usability heuristics, and provides corresponding explanations and improvement recommendations. To reduce the developer effort of manual prioritization, the recommendations are ranked by severity. The quality and practical usefulness of the generated recommendations were evaluated based on a user study that involved software engineers as participants. The evaluation focused on the highest-ranked suggestions provided by the model. The results demonstrate the potential of our approach to provide low-effort usability improvement recommendations. This makes it a promising complement to traditional evaluation methods, especially in settings with limited access to usability experts. In this sense, the approach serves as a basis for future integration into development tools to enable automated usability evaluation within software engineering workflows.
翻译:可用性描述了应用程序用户界面的质量属性,这些属性决定了用户与界面交互的有效性。传统的可用性评估方法需要大量专业知识和资源,这对小型团队和组织而言尤其具有挑战性。自动化可用性评估可降低使用门槛,助力提升用户体验。近年来,强大的多模态大语言模型(MLLMs)的出现为自动化可用性评估与改进建议开辟了新机遇。这类模型能处理图像、视频等视觉输入及其文本上下文信息,从而识别可用性问题并生成可操作的改进方案。本文提出一种新颖的自动化方法,将有限的应用程序上下文和用户交互屏幕录制作为MLLM的输入。该模型基于尼尔森可用性启发式准则自动识别并描述可用性问题,提供相应的解释与改进建议。为减少开发者手动确定优先级的负担,建议按严重程度排序。通过一项包含软件工程师参与者的用户研究,评估了所生成建议的质量与实际效用,重点考察模型提供的优先级最高的建议。结果表明,该方法具备提供低投入可用性改进建议的潜力,可作为传统评估方法的有力补充,尤其适用于缺乏可用性专家的场景。基于此,该技术可为未来集成至开发工具奠定基础,从而实现软件工程工作流程中的自动化可用性评估。