We introduce the Glauber Generative Model (GGM), a new class of discrete diffusion models, to obtain new samples from a distribution given samples from a discrete space. GGM deploys a discrete Markov chain called the heat bath dynamics (or the Glauber dynamics) to denoise a sequence of noisy tokens to a sample from a joint distribution of discrete tokens. Our novel conceptual framework provides an exact reduction of the task of learning the denoising Markov chain to solving a class of binary classification tasks. More specifically, the model learns to classify a given token in a noisy sequence as signal or noise. In contrast, prior works on discrete diffusion models either solve regression problems to learn importance ratios, or minimize loss functions given by variational approximations. We apply GGM to language modeling and image generation, where images are discretized using image tokenizers like VQGANs. We show that it outperforms existing discrete diffusion models in language generation, and demonstrates strong performance for image generation without using dataset-specific image tokenizers. We also show that our model is capable of performing well in zero-shot control settings like text and image infilling.
翻译:我们引入了Glauber生成模型(GGM),这是一类新的离散扩散模型,用于在给定离散空间样本的情况下从分布中获取新样本。GGM采用一种称为热浴动力学(或称Glauber动力学)的离散马尔可夫链,将含噪标记序列去噪为来自离散标记联合分布的样本。我们提出的创新概念框架将学习去噪马尔可夫链的任务精确简化为解决一类二元分类问题。具体而言,该模型学习将含噪序列中的给定标记分类为信号或噪声。相比之下,现有离散扩散模型要么通过解决回归问题来学习重要性比率,要么最小化由变分近似给出的损失函数。我们将GGM应用于语言建模和图像生成(其中图像使用VQGAN等图像标记器进行离散化处理)。实验表明,该模型在语言生成任务中优于现有离散扩散模型,并在不使用数据集专用图像标记器的情况下展现出强大的图像生成性能。此外,我们的模型在文本与图像修复等零样本控制场景中也表现出良好性能。