Discrete diffusion models have achieved success in tasks like image generation and masked language modeling but face limitations in controlled content editing. We introduce DICE (Discrete Inversion for Controllable Editing), the first approach to enable precise inversion for discrete diffusion models, including multinomial diffusion and masked generative models. By recording noise sequences and masking patterns during the reverse diffusion process, DICE enables accurate reconstruction and flexible editing of discrete data without the need for predefined masks or attention manipulation. We demonstrate the effectiveness of DICE across both image and text domains, evaluating it on models such as VQ-Diffusion, Paella, and RoBERTa. Our results show that DICE preserves high data fidelity while enhancing editing capabilities, offering new opportunities for fine-grained content manipulation in discrete spaces.
翻译:离散扩散模型在图像生成和掩码语言建模等任务中取得了成功,但在受控内容编辑方面面临局限。我们提出了DICE(可控编辑的离散反演),这是首个实现离散扩散模型精确反演的方法,涵盖多项扩散模型与掩码生成模型。通过记录反向扩散过程中的噪声序列与掩码模式,DICE能够在无需预定义掩码或注意力操控的情况下,实现对离散数据的精确重建与灵活编辑。我们在图像与文本领域验证了DICE的有效性,并在VQ-Diffusion、Paella和RoBERTa等模型上进行了评估。结果表明,DICE在保持高数据保真度的同时增强了编辑能力,为离散空间中的细粒度内容操控提供了新的可能性。