The emergence of generative AI (GenAI) models, including large language models and text-to-image models, has significantly advanced the synergy between humans and AI with not only their outstanding capability but more importantly, the intuitive communication method with text prompts. Though intuitive, text-based instructions suffer from natural languages' ambiguous and redundant nature. To address the issue, researchers have explored augmenting text-based instructions with interactions that facilitate precise and effective human intent expression, such as direct manipulation. However, the design strategy of interaction-augmented instructions lacks systematic investigation, hindering our understanding and application. To provide a panorama of interaction-augmented instructions, we propose a framework to analyze related tools from why, when, who, what, and how interactions are applied to augment text-based instructions. Notably, we identify four purposes for applying interactions, including restricting, expanding, organizing, and refining text instructions. The design paradigms for each purpose are also summarized to benefit future researchers and practitioners.
翻译:生成式人工智能(GenAI)模型(包括大语言模型和文生图模型)的出现,显著推动了人类与人工智能的协同发展。这不仅得益于其卓越的能力,更重要的是其通过文本提示实现的直观交互方式。尽管直观,基于文本的指令仍受限于自然语言的模糊性与冗余性。为解决这一问题,研究者探索了通过交互增强文本指令,以促进精确有效的人类意图表达,例如直接操作。然而,交互增强指令的设计策略缺乏系统性研究,阻碍了我们的理解与应用。为全面展现交互增强指令的图景,本文提出一个分析框架,从交互应用的动因、时机、对象、内容及方式五个维度(即为何、何时、何人、何事及如何)对相关工具进行剖析。特别地,我们识别了应用交互的四个目的:限制、扩展、组织与细化文本指令。本文亦总结了各目的的设计范式,以期为未来的研究者与实践者提供参考。