Recent advances in diffusion models have significantly improved text-to-image (T2I) generation, but they often struggle to balance fine-grained precision with high-level control. Methods like ControlNet and T2I-Adapter excel at following sketches by seasoned artists but tend to be overly rigid, replicating unintentional flaws in sketches from novice users. Meanwhile, coarse-grained methods, such as sketch-based abstraction frameworks, offer more accessible input handling but lack the precise control needed for detailed, professional use. To address these limitations, we propose KnobGen, a dual-pathway framework that democratizes sketch-based image generation by seamlessly adapting to varying levels of sketch complexity and user skill. KnobGen uses a Coarse-Grained Controller (CGC) module for high-level semantics and a Fine-Grained Controller (FGC) module for detailed refinement. The relative strength of these two modules can be adjusted through our knob inference mechanism to align with the user's specific needs. These mechanisms ensure that KnobGen can flexibly generate images from both novice sketches and those drawn by seasoned artists. This maintains control over the final output while preserving the natural appearance of the image, as evidenced on the MultiGen-20M dataset and a newly collected sketch dataset.
翻译:近年来,扩散模型在文本到图像(T2I)生成方面取得了显著进展,但它们往往难以在细粒度精度与高层级控制之间取得平衡。像ControlNet和T2I-Adapter这类方法擅长遵循经验丰富的艺术家的草图,但往往过于僵化,会复制新手用户草图中无意的瑕疵。与此同时,粗粒度方法(例如基于草图的抽象框架)提供了更易于处理的输入方式,但缺乏专业详细应用所需的精确控制。为应对这些局限性,我们提出了KnobGen,这是一个双路径框架,通过无缝适应不同复杂度的草图和用户技能水平,使基于草图的图像生成更加大众化。KnobGen使用一个粗粒度控制器(CGC)模块来处理高层级语义,以及一个细粒度控制器(FGC)模块来进行细节优化。这两个模块的相对强度可以通过我们的旋钮推理机制进行调整,以满足用户的特定需求。这些机制确保KnobGen能够灵活地根据新手草图和经验丰富的艺术家绘制的草图生成图像。这在保持对最终输出控制的同时,也保留了图像的自然外观,这一点在MultiGen-20M数据集和新收集的草图数据集上得到了验证。