Recent advances in multimodal large language models (LLMs) have lowered the barriers to rapidly prototyping AI-powered features via prompting, especially for mobile-intended use cases. Despite the value of situated user feedback, the process of soliciting early, mobile-situated user feedback on AI prototypes remains challenging. The broad scope and flexibility of LLMs means that, for a given use-case-specific prototype, there is a crucial need to understand the wide range of in-the-wild input likely to be provided by the user, as well as their in-context expectations of the AI's behavior. To explore the concept of in situ AI prototyping and testing, we created MobileMaker: an AI prototyping tool that enables designers to rapidly create mobile AI prototypes that can be tested on-device, and enables testers to make on-device, in-the-field revisions of the prototype through natural language. In an exploratory study with 16 users, we explored how user feedback on prototypes created with MobileMaker compares to that of existing prototyping tools (e.g., Figma, prompt editors). We found that MobileMaker prototypes enabled more serendipitous discovery of: model input edge cases, discrepancies between AI's and user's in-context interpretation of the task, and contextual signals missed by the AI. Furthermore, we learned that while the ability to make in-the-wild revisions led users to feel more fulfilled as active participants in the design process, it might also constrain their feedback to the subset of changes perceived as more actionable or implementable by the prototyping tool.
翻译:多模态大语言模型(LLMs)的最新进展降低了通过提示快速原型化AI赋能功能的门槛,尤其是针对移动场景的应用。尽管情境化用户反馈具有重要价值,但在早期阶段获取AI原型的移动情境用户反馈仍然充满挑战。LLMs的广泛覆盖范围与灵活性意味着,对于特定应用场景的原型而言,亟需理解用户可能在真实环境中提供的多样化输入,以及他们对AI行为的上下文预期。为探索原位AI原型设计与测试的概念,我们开发了MobileMaker:一个AI原型设计工具,使设计师能够快速创建可在设备上测试的移动AI原型,并允许测试人员在实地通过自然语言对原型进行设备端修订。通过一项包含16名用户的探索性研究,我们对比了用户对MobileMaker创建的原型与现有原型工具(如Figma、提示编辑器)的反馈差异。研究发现,MobileMaker原型更有利于偶然发现:模型输入的边界情况、AI与用户对任务上下文理解的差异,以及AI遗漏的上下文信号。此外,我们了解到,尽管实地修订能力让用户作为设计过程的主动参与者感到更满足,但也可能将他们的反馈限制在原型工具认为更易执行或实现的修改子集内。