In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, there are additional attributes which are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models, causing degraded test time performance. We present a simple fix to this problem by constructing stochastic processes that fully exploit the combinatorial structures, hence the name ComboStoc. Using this simple strategy, we show that network training is significantly accelerated across diverse data modalities, including images and 3D structured shapes. Moreover, ComboStoc enables a new way of test time generation which uses insynchronized time steps for different dimensions and attributes, thus allowing for varying degrees of control over them.
翻译:在本文中,我们研究扩散生成模型中一个未被充分探索但至关重要的因素,即组合复杂性。数据样本通常是高维的,对于各种结构化生成任务,还存在与数据样本相关联的附加属性。我们证明,现有扩散生成模型的训练方案对由维度和属性组合所张成的空间采样不足,导致测试时性能下降。我们通过构建充分利用组合结构的随机过程,提出了一个简单的解决方案,因此命名为ComboStoc。使用这一简单策略,我们展示了网络训练在包括图像和3D结构化形状在内的多种数据模态上均得到显著加速。此外,ComboStoc实现了一种新的测试时生成方式,即对不同维度和属性使用非同步的时间步长,从而允许对它们进行不同程度的控制。