Ensuring usability is crucial for the success of mobile apps. Usability issues can compromise user experience and negatively impact the perceived app quality. This paper presents UX-LLM, a novel tool powered by a Large Vision-Language Model that predicts usability issues in iOS apps. To evaluate the performance of UX-LLM, we predicted usability issues in two open-source apps of a medium complexity and asked two usability experts to assess the predictions. We also performed traditional usability testing and expert review for both apps and compared the results to those of UX-LLM. UX-LLM demonstrated precision ranging from 0.61 and 0.66 and recall between 0.35 and 0.38, indicating its ability to identify valid usability issues, yet failing to capture the majority of issues. Finally, we conducted a focus group with an app development team of a capstone project developing a transit app for visually impaired persons. The focus group expressed positive perceptions of UX-LLM as it identified unknown usability issues in their app. However, they also raised concerns about its integration into the development workflow, suggesting potential improvements. Our results show that UX-LLM cannot fully replace traditional usability evaluation methods but serves as a valuable supplement particularly for small teams with limited resources, to identify issues in less common user paths, due to its ability to inspect the source code.
翻译:确保可用性对移动应用的成功至关重要。可用性问题可能损害用户体验并对应用感知质量产生负面影响。本文提出UX-LLM——一种基于大型视觉语言模型的新型工具,用于预测iOS应用中的可用性问题。为评估UX-LLM的性能,我们在两个中等复杂度的开源应用中预测可用性问题,并邀请两位可用性专家对预测结果进行评估。同时,我们对两款应用进行了传统可用性测试和专家评审,并将结果与UX-LLM的预测进行对比。UX-LLM的精确度介于0.61至0.66之间,召回率在0.35至0.38范围内,表明其具备识别有效可用性问题的能力,但未能捕捉大部分问题。最后,我们与一个开发视障人士交通应用的毕业设计项目团队进行了焦点小组讨论。该小组对UX-LLM持积极态度,因其发现了他们应用中未知的可用性问题,但也对其融入开发流程提出疑虑,并建议潜在改进方向。研究结果表明,UX-LLM虽不能完全取代传统可用性评估方法,但由于其能够检查源代码,可作为有价值的补充工具,特别适用于资源有限的小型团队识别非常见用户路径中的问题。