Large language models (LLMs) exhibit dynamic capabilities and appear to comprehend complex and ambiguous natural language prompts. However, calibrating LLM interactions is challenging for interface designers and end-users alike. A central issue is our limited grasp of how human cognitive processes begin with a goal and form intentions for executing actions, a blindspot even in established interaction models such as Norman's gulfs of execution and evaluation. To address this gap, we theorize how end-users 'envision' translating their goals into clear intentions and craft prompts to obtain the desired LLM response. We define a process of Envisioning by highlighting three misalignments: (1) knowing whether LLMs can accomplish the task, (2) how to instruct the LLM to do the task, and (3) how to evaluate the success of the LLM's output in meeting the goal. Finally, we make recommendations to narrow the envisioning gulf in human-LLM interactions.
翻译:大型语言模型(LLMs)展现出动态能力,似乎能够理解复杂且模糊的自然语言提示。然而,对接口设计师和终端用户而言,校准LLM交互仍是一项挑战。核心问题在于我们对其认知过程(人类如何从目标出发形成执行意图)的理解有限——即使在与诺曼的执行与评估鸿沟等成熟交互模型中,这一盲区依然存在。为弥补这一空白,我们从理论层面探讨终端用户如何"构想"将目标转化为清晰意图,并设计提示以获取所需的LLM响应。我们通过强调三种失调现象来定义"构想"过程:(1)判断LLM能否完成任务;(2)如何指示LLM执行任务;(3)如何评估LLM输出是否成功满足目标。最后,我们提出缩小人-LLM交互中构想鸿沟的建议。