Ensuring usability is crucial for the success of mobile apps. Usability issues can compromise user experience and negatively impact the perceived app quality. This paper presents UX-LLM, a novel tool powered by a Large Vision-Language Model that predicts usability issues in iOS apps. To evaluate the performance of UX-LLM, we predicted usability issues in two open-source apps of a medium complexity and asked two usability experts to assess the predictions. We also performed traditional usability testing and expert review for both apps and compared the results to those of UX-LLM. UX-LLM demonstrated precision ranging from 0.61 and 0.66 and recall between 0.35 and 0.38, indicating its ability to identify valid usability issues, yet failing to capture the majority of issues. Finally, we conducted a focus group with an app development team of a capstone project developing a transit app for visually impaired persons. The focus group expressed positive perceptions of UX-LLM as it identified unknown usability issues in their app. However, they also raised concerns about its integration into the development workflow, suggesting potential improvements. Our results show that UX-LLM cannot fully replace traditional usability evaluation methods but serves as a valuable supplement particularly for small teams with limited resources, to identify issues in less common user paths, due to its ability to inspect the source code.
翻译:确保可用性对于移动应用的成功至关重要。可用性问题会损害用户体验并对应用感知质量产生负面影响。本文提出了UX-LLM,一种由大型视觉语言模型驱动的新型工具,用于预测iOS应用中的可用性问题。为评估UX-LLM的性能,我们在两个中等复杂度的开源应用中预测了可用性问题,并邀请两位可用性专家对预测结果进行评估。同时,我们对这两个应用进行了传统的可用性测试和专家评审,并将结果与UX-LLM的预测进行了比较。UX-LLM的精确度介于0.61至0.66之间,召回率介于0.35至0.38之间,表明其能够识别有效的可用性问题,但未能捕捉大部分问题。最后,我们与一个开发面向视障人士的公交应用的毕业设计项目团队进行了焦点小组讨论。该小组对UX-LLM持积极态度,因为它识别出了其应用中先前未知的可用性问题。然而,他们也对其融入开发流程的可行性提出了担忧,并提出了潜在的改进建议。我们的研究结果表明,UX-LLM无法完全取代传统的可用性评估方法,但由于其能够检查源代码,可作为有价值的补充工具,特别适用于资源有限的小型团队,以识别较少出现的用户路径中的问题。