We introduce SemanticDraw, a new paradigm of interactive content creation where high-quality images are generated in near real-time from given multiple hand-drawn regions, each encoding prescribed semantic meaning. In order to maximize the productivity of content creators and to fully realize their artistic imagination, it requires both quick interactive interfaces and fine-grained regional controls in their tools. Despite astonishing generation quality from recent diffusion models, we find that existing approaches for regional controllability are very slow (52 seconds for $512 \times 512$ image) while not compatible with acceleration methods such as LCM, blocking their huge potential in interactive content creation. From this observation, we build our solution for interactive content creation in two steps: (1) we establish compatibility between region-based controls and acceleration techniques for diffusion models, maintaining high fidelity of multi-prompt image generation with $\times 10$ reduced number of inference steps, (2) we increase the generation throughput with our new multi-prompt stream batch pipeline, enabling low-latency generation from multiple, region-based text prompts on a single RTX 2080 Ti GPU. Our proposed framework is generalizable to any existing diffusion models and acceleration schedulers, allowing sub-second (0.64 seconds) image content creation application upon well-established image diffusion models. Our project page is: https://jaerinlee.com/research/semantic-draw.
翻译:本文提出SemanticDraw,一种新型交互式内容创作范式,能够根据多个手绘区域(每个区域编码特定语义含义)在近实时条件下生成高质量图像。为最大化内容创作者的生产力并充分实现其艺术想象,创作工具需兼具快速交互界面与细粒度区域控制能力。尽管当前扩散模型展现出惊人的生成质量,但我们发现现有区域控制方法存在速度缓慢(生成512×512图像需52秒)且与LCM等加速技术不兼容的问题,阻碍了其在交互式内容创作中的巨大潜力。基于此观察,我们通过两个步骤构建交互式内容创作解决方案:(1)建立区域控制与扩散模型加速技术的兼容性,在将推理步数减少10倍的同时保持多提示图像生成的高保真度;(2)通过新型多提示流批处理流水线提升生成吞吐量,在单张RTX 2080 Ti GPU上实现基于多区域文本提示的低延迟生成。所提框架可泛化至现有任意扩散模型与加速调度器,在成熟图像扩散模型上实现亚秒级(0.64秒)图像内容创作应用。项目页面详见:https://jaerinlee.com/research/semantic-draw。