Designers of digital solutions increasingly consult Large Language Models (LLMs) for their work. However, it remains unclear how this may affect the user experiences they produce and there are no established practices. We investigate how design preferences expressed by LLM-driven simulation methods align with those of real users. We present a study that aggregates real-world data and design stimuli from twenty-nine preference tests conducted in practice by users of the UXtweak online research platform (n = 2073). We perform holistic multimodal simulations where we manipulate LLM variables (model reasoning, sampling, persona type, and specificity) and assess their effects on algorithmic fidelity. Our results unveil significant and systematic discrepancies between peoples' real design preferences and LLM simulations that are consistent across manipulations. Synthetic justifications lack genuine depth, nuance and reasoning, which they substitute by patterns like focus on generic properties, specific elements, elaboration and overpraising. The unique attention directed by this research toward preferences within visual design stimuli highlights misrepresentation of perception and meaning by LLMs in a context that is intuitive yet critical for design teams. The external and ecological validity of our findings is high, given their replication across a multitude of real-world studies.
翻译:数字解决方案的设计师越来越多地在工作中咨询大型语言模型(LLMs)。然而,这对其产出的用户体验可能产生何种影响尚不明确,且缺乏既定的实践规范。我们研究了LLM驱动模拟方法所表达的设计偏好与实际用户偏好之间的对齐程度。本文呈现了一项研究,该研究汇总了来自UXtweak在线研究平台用户在实践中的二十九项偏好测试的真实世界数据与设计刺激(样本量n=2073)。我们开展了整体多模态模拟,通过操控LLM变量(模型推理、采样、人物角色类型及特异性)来评估其对算法保真度的影响。研究结果揭示了人们真实设计偏好与LLM模拟之间显著且系统性的差异,且这些差异在不同操控条件下保持一致。合成的解释缺乏真实的深度、细微差别和推理,它们转而表现出关注泛化属性、特定元素、精细化描述及过度赞美等模式。本研究对视觉设计刺激中偏好的独特关注,凸显了LLM在直觉性强但设计团队至关重要的情境下对感知与意义的扭曲表征。鉴于我们的发现横跨多项真实世界研究具有高外部效度和生态效度。