Despite being trained on vast amounts of data, most LLMs are unable to reliably generate well-designed UIs. Designer feedback is essential to improving performance on UI generation; however, we find that existing RLHF methods based on ratings or rankings are not well-aligned with with designers' workflows and ignore the rich rationale used to critique and improve UI designs. In this paper, we investigate several approaches for designers to give feedback to UI generation models, using familiar interactions such as commenting, sketching and direct manipulation. We first perform an evaluation with 21 designers where they gave feedback using these interactions, which resulted in 1500 design annotations. We then use this data to finetune a series of LLMs to generate higher quality UIs. Finally, we evaluate these models with human judges, and we find that our designer-aligned approaches outperform models trained with traditional ranking feedback and all tested baselines, including GPT-5.
翻译:尽管经过海量数据训练,大多数大语言模型仍无法可靠地生成设计精良的用户界面。设计师反馈对于提升UI生成性能至关重要;然而,我们发现现有的基于评分或排序的RLHF方法与设计师工作流程契合度不足,且忽略了用于批判和改进UI设计的丰富原理。本文研究了设计师通过评论、草图绘制和直接操作等熟悉交互方式向UI生成模型提供反馈的多种途径。我们首先对21名设计师开展评估实验,收集其通过上述交互方式提供的反馈,共获得1500条设计标注数据。随后利用该数据对一系列大语言模型进行微调,以生成更高质量的UI设计。最终通过人工评估验证,我们发现与设计师工作流对齐的方法在性能上优于基于传统排序反馈训练的模型及所有测试基线(包括GPT-5)。