Large language models (LLMs) exhibit dynamic capabilities and appear to comprehend complex and ambiguous natural language prompts. However, calibrating LLM interactions is challenging for interface designers and end-users alike. A central issue is our limited grasp of how human cognitive processes begin with a goal and form intentions for executing actions, a blindspot even in established interaction models such as Norman's gulfs of execution and evaluation. To address this gap, we theorize how end-users 'envision' translating their goals into clear intentions and craft prompts to obtain the desired LLM response. We define a process of Envisioning by highlighting three misalignments: (1) knowing whether LLMs can accomplish the task, (2) how to instruct the LLM to do the task, and (3) how to evaluate the success of the LLM's output in meeting the goal. Finally, we make recommendations to narrow the envisioning gulf in human-LLM interactions.
翻译:大型语言模型展现出动态能力,似乎能理解复杂且模糊的自然语言提示。然而,对界面设计师和终端用户而言,校准LLM交互具有挑战性。一个核心问题在于,我们对人类认知过程如何从目标开始、形成执行动作意图的理解十分有限,即使是诺曼的执行与评估鸿沟等成熟交互模型也存在这一盲区。为填补这一空白,我们从理论上阐述了终端用户如何“构想”将目标转化为明确意图,并构建提示以获得所需的LLM响应。我们通过强调三种错位来定义构想过程:(1)了解LLM是否能完成任务;(2)如何指示LLM执行任务;(3)如何评估LLM输出在实现目标方面的成功程度。最后,我们提出缩小人机LLM交互中构想鸿沟的建议。