2D concept art generation for 3D scenes is a crucial yet challenging task in computer graphics, as creating natural intuitive environments still demands extensive manual effort in concept design. While generative AI has simplified 2D concept design via text-to-image synthesis, it struggles with complex multi-instance scenes and offers limited support for structured terrain layout. In this paper, we propose a Training-free Triplet Tuning for Sketch-to-Scene (T3-S2S) generation after reviewing the entire cross-attention mechanism. This scheme revitalizes the ControlNet model for detailed multi-instance generation via three key modules: Prompt Balance ensures keyword representation and minimizes the risk of missing critical instances; Characteristic Priority emphasizes sketch-based features by highlighting TopK indices in feature channels; and Dense Tuning refines contour details within instance-related regions of the attention map. Leveraging the controllability of T3-S2S, we also introduce a feature-sharing strategy with dual prompt sets to generate layer-aware isometric and terrain-view representations for the terrain layout. Experiments show that our sketch-to-scene workflow consistently produces multi-instance 2D scenes with details aligned with input prompts.
翻译:三维场景的二维概念艺术生成是计算机图形学中一项关键且具有挑战性的任务,因为创建自然直观的环境在概念设计阶段仍需要大量人工投入。尽管生成式人工智能通过文本到图像合成简化了二维概念设计,但其在处理复杂多实例场景时存在困难,并且对结构化地形布局的支持有限。本文在全面审视交叉注意力机制后,提出了一种用于草图到场景生成的免训练三元组调优方案。该方案通过三个关键模块重新激活了ControlNet模型以实现精细的多实例生成:提示词平衡确保关键词表征并降低关键实例缺失的风险;特征优先级通过突出特征通道中的TopK索引来强调基于草图的特征;密集调优则在注意力图中与实例相关的区域内细化轮廓细节。利用T$^3$-S2S的可控性,我们还引入了一种采用双提示词集的特征共享策略,以生成面向地形布局的层感知等轴测图和地形视图表示。实验表明,我们的草图到场景工作流能够持续生成多实例二维场景,其细节与输入提示词保持一致。