Recent advances in image generation have achieved remarkable visual quality, while a fundamental challenge remains: Can image generation be controlled at the element level, enabling intuitive modifications such as adjusting shapes, altering colors, or adding and removing objects? In this work, we address this challenge by introducing layer-wise controllable generation through simplified vector graphics (VGs). Our approach first efficiently parses images into hierarchical VG representations that are semantic-aligned and structurally coherent. Building on this representation, we design a novel image synthesis framework guided by VGs, allowing users to freely modify elements and seamlessly translate these edits into photorealistic outputs. By leveraging the structural and semantic features of VGs in conjunction with noise prediction, our method provides precise control over geometry, color, and object semantics. Extensive experiments demonstrate the effectiveness of our approach in diverse applications, including image editing, object-level manipulation, and fine-grained content creation, establishing a new paradigm for controllable image generation. Project page: https://guolanqing.github.io/Vec2Pix/
翻译:近年来,图像生成技术取得了显著的视觉质量提升,但一个根本性挑战依然存在:图像生成能否在元素层面实现控制,从而支持诸如调整形状、改变颜色、添加或移除物体等直观修改?在本工作中,我们通过引入基于简化矢量图形的分层可控生成来解决这一挑战。我们的方法首先将图像高效解析为语义对齐且结构连贯的分层矢量图形表示。基于此表示,我们设计了一种由矢量图形引导的新型图像合成框架,允许用户自由修改元素,并将这些编辑无缝转换为逼真的输出结果。通过结合利用矢量图形的结构、语义特征与噪声预测,我们的方法能够对几何形状、颜色和物体语义提供精确控制。大量实验证明了我们的方法在图像编辑、物体级操控和细粒度内容生成等多种应用中的有效性,从而为可控图像生成建立了一种新范式。项目页面:https://guolanqing.github.io/Vec2Pix/