LooseControl: Lifting ControlNet for Generalized Depth Conditioning

We present LooseControl to allow generalized depth conditioning for diffusion-based image generation. ControlNet, the SOTA for depth-conditioned image generation, produces remarkable results but relies on having access to detailed depth maps for guidance. Creating such exact depth maps, in many scenarios, is challenging. This paper introduces a generalized version of depth conditioning that enables many new content-creation workflows. Specifically, we allow (C1) scene boundary control for loosely specifying scenes with only boundary conditions, and (C2) 3D box control for specifying layout locations of the target objects rather than the exact shape and appearance of the objects. Using LooseControl, along with text guidance, users can create complex environments (e.g., rooms, street views, etc.) by specifying only scene boundaries and locations of primary objects. Further, we provide two editing mechanisms to refine the results: (E1) 3D box editing enables the user to refine images by changing, adding, or removing boxes while freezing the style of the image. This yields minimal changes apart from changes induced by the edited boxes. (E2) Attribute editing proposes possible editing directions to change one particular aspect of the scene, such as the overall object density or a particular object. Extensive tests and comparisons with baselines demonstrate the generality of our method. We believe that LooseControl can become an important design tool for easily creating complex environments and be extended to other forms of guidance channels. Code and more information are available at https://shariqfarooq123.github.io/loose-control/ .

翻译：我们提出LooseControl方法，旨在实现基于扩散模型的图像生成中的广义深度条件控制。当前最先进的深度条件图像生成方法ControlNet虽能产生显著效果，但其依赖精确深度图作为引导，而在许多场景中生成此类精确深度图极具挑战性。本文提出深度条件控制的广义版本，支持多种新型内容创作流程。具体而言，我们实现：（C1）场景边界控制——仅需边界条件即可松散指定场景；（C2）三维盒子控制——仅需指定目标物体的布局位置而非精确形状与外观。借助LooseControl，用户通过文本引导与简单指定场景边界及主要物体位置，即可生成复杂环境（如室内场景、街景等）。此外，我们提供两种编辑机制以优化结果：（E1）三维盒子编辑——允许用户通过修改、添加或删除盒子来优化图像，同时保持图像风格不变，实现仅受编辑盒子影响的最小化改动；（E2）属性编辑——提出可能的编辑方向，用于改变场景的特定方面（如整体物体密度或某个特定物体）。大量测试与基线对比证明了本方法的通用性。我们相信LooseControl可成为便捷创建复杂环境的重要设计工具，并拓展至其他引导通道形式。代码与更多信息请访问https://shariqfarooq123.github.io/loose-control/。