We develop a method for producing vector sketches one part at a time. To do this, we train a multi-modal language model-based agent using a novel multi-turn process-reward reinforcement learning following supervised fine-tuning. Our approach is enabled by a new dataset we call ControlSketch-Part, containing rich part-level annotations for sketches, obtained using a novel, generic automatic annotation pipeline that segments vector sketches into semantic parts and assigns paths to parts with a structured multi-stage labeling process. Our results indicate that incorporating structured part-level data and providing agent with the visual feedback through the process enables interpretable, controllable, and locally editable text-to-vector sketch generation.
翻译:我们提出了一种逐部分生成矢量草图的方法。为此,我们在监督微调后,采用一种新颖的多轮过程奖励强化学习,训练了一个基于多模态语言模型的智能体。我们的方法得益于一个名为ControlSketch-Part的新数据集,该数据集包含丰富的草图部分级标注,通过一种新颖、通用的自动标注流水线获得,该流水线将矢量草图分割为语义部分,并通过结构化的多阶段标签过程为各部分分配路径。我们的结果表明,融入结构化的部分级数据,并通过过程为智能体提供视觉反馈,能够实现可解释、可控且可局部编辑的文本到矢量草图生成。