We develop a method for producing vector sketches one part at a time. To do this, we train a multi-modal language model-based agent using a novel multi-turn process-reward reinforcement learning following supervised fine-tuning. Our approach is enabled by a new dataset we call ControlSketch-Part, containing rich part-level annotations for sketches, obtained using a novel, generic automatic annotation pipeline that segments vector sketches into semantic parts and assigns paths to parts with a structured multi-stage labeling process. Our results indicate that incorporating structured part-level data and providing agent with the visual feedback through the process enables interpretable, controllable, and locally editable text-to-vector sketch generation.
翻译:我们提出了一种逐步生成向量草图的方法。为此,我们基于多模态语言模型训练了一个智能体,采用监督微调后的新型多轮过程奖励强化学习策略。该方法的核心支撑是我们构建的ControlSketch-Part数据集——该数据集通过新型通用自动标注流水线获取了丰富的草图部件级标注信息:在结构化多阶段标记流程中,将向量草图分割为语义部件并建立路径与部件的对应关系。实验结果表明,融合结构化部件级数据并通过过程反馈向智能体提供视觉信息,能够实现可解释、可控且支持局部编辑的文本到向量草图生成。