In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, additional attributes are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes can be insufficiently covered by existing training schemes of diffusion generative models, potentially limiting test time performance. We present a simple fix to this problem by constructing stochastic processes that fully exploit the combinatorial structures, hence the name ComboStoc. Using this simple strategy, we show that network training is significantly accelerated across diverse data modalities, including images and 3D structured shapes. Moreover, ComboStoc enables a new way of test time generation which uses asynchronous time steps for different dimensions and attributes, thus allowing for varying degrees of control over them. Our code is available at: https://github.com/Xrvitd/ComboStoc
翻译:本文研究了扩散生成模型中一个尚未充分探索但至关重要的因素,即组合复杂度。数据样本通常具有高维特性,而在各类结构化生成任务中,额外的属性会与数据样本相结合。我们发现,现有扩散生成模型的训练方案可能无法充分覆盖由维度与属性组合所张成的空间,从而限制了模型的测试性能。为此,我们提出了一种简单的解决方案:通过构建能充分利用组合结构的随机过程(称为ComboStoc)。实验表明,这一简单策略能显著加速跨多种数据模态(如图像和三维结构化形状)的网络训练。此外,ComboStoc 还提供了一种新的测试时生成方式,允许对不同维度和属性采用异步时间步长,从而实现对它们不同程度的控制。我们的代码已开源:https://github.com/Xrvitd/ComboStoc