We present a Multi-Instance Generation (MIG) task, simultaneously generating multiple instances with diverse controls in one image. Given a set of predefined coordinates and their corresponding descriptions, the task is to ensure that generated instances are accurately at the designated locations and that all instances' attributes adhere to their corresponding description. This broadens the scope of current research on Single-instance generation, elevating it to a more versatile and practical dimension. Inspired by the idea of divide and conquer, we introduce an innovative approach named Multi-Instance Generation Controller (MIGC) to address the challenges of the MIG task. Initially, we break down the MIG task into several subtasks, each involving the shading of a single instance. To ensure precise shading for each instance, we introduce an instance enhancement attention mechanism. Lastly, we aggregate all the shaded instances to provide the necessary information for accurately generating multiple instances in stable diffusion (SD). To evaluate how well generation models perform on the MIG task, we provide a COCO-MIG benchmark along with an evaluation pipeline. Extensive experiments were conducted on the proposed COCO-MIG benchmark, as well as on various commonly used benchmarks. The evaluation results illustrate the exceptional control capabilities of our model in terms of quantity, position, attribute, and interaction.
翻译:我们提出多实例生成(MIG)任务,即在单张图像中同时生成具有多种控制条件的多个实例。给定一组预定义坐标及其对应描述,该任务需确保生成的实例精确位于指定位置,且所有实例的属性与相应描述一致。这拓展了当前单实例生成研究的范畴,将其提升至更具通用性与实用性的维度。受分而治之思想启发,我们提出一种名为多实例生成控制器(MIGC)的创新方法以应对MIG任务的挑战。首先将MIG任务分解为若干子任务,每个子任务涉及单个实例的着色。为确保每个实例的精确着色,我们引入实例增强注意力机制。最后聚合所有着色实例,为稳定扩散(SD)中准确生成多实例提供必要信息。为评估生成模型在MIG任务上的性能,我们提供COCO-MIG基准测试及配套评估流程。我们在所提出的COCO-MIG基准测试及多种常用基准测试上进行了大量实验。评估结果表明,我们的模型在数量、位置、属性及交互方面均展现卓越的控制能力。