UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset

Automated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that automatic evaluation can be improved by collecting a targeted UI feedback dataset and then using this dataset to enhance the performance of general-purpose LLMs. We present a targeted dataset of 3,059 design critiques and quality ratings for 983 mobile UIs, collected from seven experienced designers. We carried out an in-depth analysis to characterize the dataset's features. We then applied this dataset to achieve a 55% performance gain in LLM-generated UI feedback via various few-shot and visual prompting techniques. We also discuss future applications of this dataset, including training a reward model for generative UI techniques, and fine-tuning a tool-agnostic multi-modal LLM that automates UI evaluation.

翻译：自动化用户界面评估对设计过程大有裨益；例如，可用于比较不同的UI设计，或进行自动化启发式评估。特别是基于大语言模型的UI评估，有望广泛适用于各类UI形态和评估任务。然而，当前基于大语言模型的技术尚未达到人类评估者的性能水平。我们假设，通过收集针对性的UI反馈数据集并利用该数据集提升通用大语言模型的性能，可以改进自动化评估。我们提出了一个针对性数据集，包含来自七位经验丰富设计师对983个移动端UI的3,059条设计批判与质量评分。我们进行了深入分析以刻画该数据集的特征。随后，我们应用该数据集，通过多种少样本提示和视觉提示技术，使大语言模型生成的UI反馈性能提升了55%。我们还探讨了该数据集的未来应用方向，包括为生成式UI技术训练奖励模型，以及微调一个与工具无关、可自动化UI评估的多模态大语言模型。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日