Scene generation is crucial to many computer graphics applications. Recent advances in generative AI have streamlined sketch-to-image workflows, easing the workload for artists and designers in creating scene concept art. However, these methods often struggle for complex scenes with multiple detailed objects, sometimes missing small or uncommon instances. In this paper, we propose a Training-free Triplet Tuning for Sketch-to-Scene (T3-S2S) generation after reviewing the entire cross-attention mechanism. This scheme revitalizes the existing ControlNet model, enabling effective handling of multi-instance generations, involving prompt balance, characteristics prominence, and dense tuning. Specifically, this approach enhances keyword representation via the prompt balance module, reducing the risk of missing critical instances. It also includes a characteristics prominence module that highlights TopK indices in each channel, ensuring essential features are better represented based on token sketches. Additionally, it employs dense tuning to refine contour details in the attention map, compensating for instance-related regions. Experiments validate that our triplet tuning approach substantially improves the performance of existing sketch-to-image models. It consistently generates detailed, multi-instance 2D images, closely adhering to the input prompts and enhancing visual quality in complex multi-instance scenes. Code is available at https://github.com/chaos-sun/t3s2s.git.
翻译:场景生成对众多计算机图形学应用至关重要。生成式人工智能的最新进展简化了草图到图像的生成流程,减轻了艺术家和设计师在创作场景概念艺术时的工作负担。然而,现有方法在处理包含多个细节对象的复杂场景时往往表现不佳,有时会遗漏小型或不常见的实例。本文在全面审视交叉注意力机制的基础上,提出了一种面向草图到场景生成的无训练三元组调优方法。该方案通过提示平衡、特征突出和密集调优三个模块,有效激活现有ControlNet模型处理多实例生成的能力。具体而言,该方法通过提示平衡模块增强关键词表征,降低遗漏关键实例的风险;引入特征突出模块,通过强化各通道中TopK索引来确保基于草图标记的关键特征得到充分表达;同时采用密集调优机制精修注意力图中的轮廓细节,以补偿实例相关区域。实验验证表明,我们的三元组调优方法显著提升了现有草图到图像模型的性能,能够持续生成细节丰富的多实例二维图像,紧密遵循输入提示,并在复杂多实例场景中有效提升视觉质量。代码发布于https://github.com/chaos-sun/t3s2s.git。