Generative AI often produces results misaligned with user intentions, for example, resolving ambiguous prompts in unexpected ways. Despite existing approaches to clarify intent, a major challenge remains: understanding and influencing AI's interpretation of user intent through simple, direct inputs requiring no expertise or rigid procedures. We present ToMigo, representing intent as design concept graphs: nodes represent choices of purpose, content, or style, while edges link them with interpretable explanations. Applied to graphic design, ToMigo infers intent from reference images and text. We derived a schema of node types and edges from pre-study data, informing a multimodal large language model to generate graphs aligning nodes externally with user intent and internally toward a unified design goal. This structure enables users to explore AI reasoning and directly manipulate the design concept. In our user studies, ToMigo received high alignment ratings and captured most user intentions well. Users reported greater control and found interactive features-editable graphs, reflective chats, concept-design realignment-useful for evolving and realizing their design ideas.
翻译:生成式人工智能常产生与用户意图不符的结果,例如以意外方式解析模糊提示。尽管已有方法用于澄清意图,但核心挑战依然存在:如何通过无需专业知识或固定流程的简单直接输入,理解并影响AI对用户意图的解读。我们提出ToMigo,将意图表示为设计概念图:节点代表目的、内容或风格的选择,边则通过可解释说明连接它们。应用于平面设计时,ToMigo能从参考图像和文本推断意图。我们通过预研究数据推导出节点类型与边的架构,指导多模态大语言模型生成概念图,使节点在外部与用户意图对齐,在内部指向统一的设计目标。该结构使用户能探索AI推理过程并直接操控设计概念。用户研究表明,ToMigo获得较高的对齐评分,并能有效捕捉大多数用户意图。用户反馈称其拥有更强控制感,并认为可编辑概念图、反思式对话、概念-设计重对齐等交互功能,对完善和实现设计构思具有重要价值。