Feedback on user interface (UI) mockups is crucial in design. However, human feedback is not always readily available. We explore the potential of using large language models for automatic feedback. Specifically, we focus on applying GPT-4 to automate heuristic evaluation, which currently entails a human expert assessing a UI's compliance with a set of design guidelines. We implemented a Figma plugin that takes in a UI design and a set of written heuristics, and renders automatically-generated feedback as constructive suggestions. We assessed performance on 51 UIs using three sets of guidelines, compared GPT-4-generated design suggestions with those from human experts, and conducted a study with 12 expert designers to understand fit with existing practice. We found that GPT-4-based feedback is useful for catching subtle errors, improving text, and considering UI semantics, but feedback also decreased in utility over iterations. Participants described several uses for this plugin despite its imperfect suggestions.
翻译:用户界面(UI)设计稿的反馈对设计至关重要,但人工反馈并非总能及时获取。本研究探索了利用大语言模型实现自动反馈的潜力,具体聚焦于应用GPT-4自动化启发式评估——该评估目前需由人类专家评估UI是否符合一系列设计指南。我们实现了一个Figma插件,该插件可接收UI设计稿与一组书面启发式规则,并自动生成具有建设性建议的反馈。通过三组准则对51个UI进行性能评估,将GPT-4生成的设计建议与人类专家建议进行对比,并开展包含12名资深设计师的研究以理解其与现有实践的契合度。研究发现:基于GPT-4的反馈有助于捕捉细微错误、优化文本、考量UI语义,但反馈效用随迭代次数增加而递减。尽管该插件的建议尚不完善,参与设计师仍描述了其多种应用场景。