The design space of discrete-space diffusion or flow generative models are significantly less well-understood than their continuous-space counterparts, with many works focusing only on a simple masked construction. In this work, we aim to take a holistic approach to the construction of discrete generative models based on continuous-time Markov chains, and for the first time, allow the use of arbitrary discrete probability paths, or colloquially, corruption processes. Through the lens of optimizing the symmetric kinetic energy, we propose velocity formulas that can be applied to any given probability path, completely decoupling the probability and velocity, and giving the user the freedom to specify any desirable probability path based on expert knowledge specific to the data domain. Furthermore, we find that a special construction of mixture probability paths optimizes the symmetric kinetic energy for the discrete case. We empirically validate the usefulness of this new design space across multiple modalities: text generation, inorganic material generation, and image generation. We find that we can outperform the mask construction even in text with kinetic-optimal mixture paths, while we can make use of domain-specific constructions of the probability path over the visual domain.
翻译:离散空间扩散或流生成模型的设计空间远不如其连续空间对应物那样被深入理解,许多工作仅聚焦于简单的掩码构建。本工作旨在基于连续时间马尔可夫链,对离散生成模型的构建采取一种整体性方法,并首次允许使用任意的离散概率路径,或通俗而言,任意“腐蚀过程”。通过优化对称动能这一视角,我们提出了可应用于任意给定概率路径的速度公式,从而将概率与速度完全解耦,并赋予用户基于特定数据领域的专家知识来指定任何期望概率路径的自由度。此外,我们发现一种特殊的混合概率路径构造优化了离散情况下的对称动能。我们在多种模态上实证验证了这一新设计空间的有效性:文本生成、无机材料生成和图像生成。我们发现,即使在文本生成任务中,使用动力学最优的混合路径也能超越掩码构造的性能,同时我们能够在视觉领域利用特定领域知识来构造概率路径。