Generative AI often produces results misaligned with user intentions, for example, resolving ambiguous prompts in unexpected ways. Despite existing approaches to clarify intent, a major challenge remains: understanding and influencing AI's interpretation of user intent through simple, direct inputs requiring no expertise or rigid procedures. We present ToMigo, representing intent as design concept graphs: nodes represent choices of purpose, content, or style, while edges link them with interpretable explanations. Applied to graphic design, ToMigo infers intent from reference images and text. We derived a schema of node types and edges from pre-study data, informing a multimodal large language model to generate graphs aligning nodes externally with user intent and internally toward a unified design goal. This structure enables users to explore AI reasoning and directly manipulate the design concept. In our user studies, ToMigo received high alignment ratings and captured most user intentions well. Users reported greater control and found interactive features-editable graphs, reflective chats, concept-design realignment-useful for evolving and realizing their design ideas.
翻译:生成式人工智能常产生与用户意图不符的结果,例如以意外方式解析模糊提示。尽管已有方法用于澄清意图,但一个主要挑战依然存在:通过无需专业知识或严格程序的简单直接输入,理解并影响人工智能对用户意图的解读。我们提出ToMigo,将意图表示为设计概念图谱:节点代表目的、内容或风格的选择,边则通过可解释的说明连接它们。应用于平面设计时,ToMigo从参考图像和文本推断意图。我们从预研究数据中推导出节点类型和边的模式,指导多模态大语言模型生成图谱,使节点在外部与用户意图对齐,在内部朝向统一的设计目标。该结构使用户能够探索人工智能的推理过程并直接操控设计概念。在我们的用户研究中,ToMigo获得了较高的对齐评分,并很好地捕捉了大多数用户意图。用户反馈获得了更强的控制感,并认为可编辑图谱、反思式对话、概念-设计重对齐等交互功能对于发展和实现其设计理念具有实用价值。