Generative AI tools can provide people with the ability to create virtual environments and scenes with natural language prompts. Yet, how people will formulate such prompts is unclear -- particularly when they inhabit the environment that they are designing. For instance, it is likely that a person might say, "Put a chair here", while pointing at a location. If such linguistic features are common to people's prompts, we need to tune models to accommodate them. In this work, we present a wizard-of-oz elicitation study with 22 participants, where we studied people's implicit expectations when verbally prompting such programming agents to create interactive VR scenes. Our findings show that people prompt with several implicit expectations: (1) that agents have an embodied knowledge of the environment; (2) that agents understand embodied prompts by users; (3) that the agents can recall previous states of the scene and the conversation, and that (4) agents have a commonsense understanding of objects in the scene. Further, we found that participants prompt differently when they are prompting in situ (i.e. within the VR environment) versus ex situ (i.e. viewing the VR environment from the outside). To explore how our could be applied, we designed and built Oastaad, a conversational programming agent that allows non-programmers to design interactive VR experiences that they inhabit. Based on these explorations, we outline new opportunities and challenges for conversational programming agents that create VR environments.
翻译:生成式AI工具能通过自然语言提示帮助用户创建虚拟环境与场景。然而,目前尚不明确用户将如何构建这类提示——特别是当他们身处自己正在设计的虚拟环境中时。例如,用户很可能会指着某个位置说"在这里放一把椅子"。如果此类语言特征在用户提示中普遍存在,我们需要调整模型以适应这些特性。本研究通过"巫师之杖"启发式实验,对22名参与者展开调研,探究用户口头指示程序代理创建交互式VR场景时的隐含预期。研究结果表明,用户的提示包含以下隐含预期:(1)代理具备对环境的具身认知;(2)代理能理解用户的具身化提示;(3)代理能记忆场景与对话的先前状态;(4)代理对场景中的物体具有常识性理解。此外,我们发现参与者在场景内(置身VR环境)与场景外(从外部观察VR环境)时采用的提示方式存在差异。为探索研究结果的应用价值,我们设计并构建了Oastaad——一个能让非编程人员创建其沉浸式交互VR体验的对话式编程代理。基于这些探索,我们勾勒出构建VR环境的对话式编程代理面临的新机遇与挑战。