This paper introduces a novel unified representation of diffusion models for image generation and segmentation. Specifically, we use a colormap to represent entity-level masks, addressing the challenge of varying entity numbers while aligning the representation closely with the image RGB domain. Two novel modules, including the location-aware color palette and progressive dichotomy module, are proposed to support our mask representation. On the one hand, a location-aware palette guarantees the colors' consistency to entities' locations. On the other hand, the progressive dichotomy module can efficiently decode the synthesized colormap to high-quality entity-level masks in a depth-first binary search without knowing the cluster numbers. To tackle the issue of lacking large-scale segmentation training data, we employ an inpainting pipeline and then improve the flexibility of diffusion models across various tasks, including inpainting, image synthesis, referring segmentation, and entity segmentation. Comprehensive experiments validate the efficiency of our approach, demonstrating comparable segmentation mask quality to state-of-the-art and adaptability to multiple tasks. The code will be released at \href{https://github.com/qqlu/Entity}{https://github.com/qqlu/Entity}.
翻译:本文提出一种用于扩散模型图像生成与分割任务的新型统一表示方法。具体而言,我们采用色图(colormap)表示实体级掩码,既解决了实体数量变化带来的挑战,又使表示方式与图像RGB域紧密对齐。为支持所提出的掩码表示,我们设计了两个创新模块:位置感知调色板模块与渐进式二分模块。位置感知调色板确保颜色与实体位置的一致性;渐进式二分模块则通过深度优先二分搜索,在无需预知聚类数量的情况下高效解码合成色图,生成高质量的实体级掩码。针对大规模分割训练数据匮乏的问题,我们采用图像修复(inpainting)流水线,并扩展了扩散模型在图像修复、图像合成、指代分割及实体分割等多个任务中的灵活性。综合实验验证了该方法的高效性,其分割掩码质量可与前沿方法媲美,且具备多任务适应能力。代码将发布于 \href{https://github.com/qqlu/Entity}{https://github.com/qqlu/Entity}。