The generative modeling of data on manifold is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the position variable between data distribution and a fixed, easy-to-sample distribution. Normally, this would incur further difficulty for manifold data because momentum lives in a space that changes with the position. However, our trivialization technique creates to a new momentum variable that stays in a simple $\textbf{fixed vector space}$. This design, together with a manifold preserving integrator, simplifies implementation and avoids inaccuracies created by approximations such as projections to tangent space and manifold, which were typically used in prior work, hence facilitating generation with high-fidelity and efficiency. The resulting method achieves state-of-the-art performance on protein and RNA torsion angle generation and sophisticated torus datasets. We also, arguably for the first time, tackle the generation of data on high-dimensional Special Orthogonal and Unitary groups, the latter essential for quantum problems.
翻译:流形数据生成建模是一项重要任务,对于该任务,平坦空间中的扩散模型通常需要进行非平凡的适配。本文展示了称为“平凡化”的技术如何将欧几里得空间中扩散模型的有效性迁移至李群。具体而言,我们通过算法引入了一个辅助动量变量,以帮助在数据分布与一个固定的、易于采样的分布之间传输位置变量。通常,这对于流形数据会带来进一步的困难,因为动量存在于一个随位置变化的空间中。然而,我们的平凡化技术创建了一个新的动量变量,该变量停留在一个简单的**固定向量空间**中。这一设计,结合一个流形保持积分器,简化了实现,并避免了诸如投影到切空间和流体等近似方法(这些在先前工作中通常被使用)所引入的不准确性,从而促进了高保真度与高效率的生成。所提出的方法在蛋白质和RNA扭转角生成以及复杂环面数据集上实现了最先进的性能。此外,我们首次尝试了在高维特殊正交群和酉群上生成数据,后者对于量子问题至关重要。