Creating large-scale datasets for training high-performance generative models is often prohibitively expensive, especially when associated attributes or annotations must be provided. As a result, merging existing datasets has become a common strategy. However, the sets of attributes across datasets are often inconsistent, and their naive concatenation typically leads to block-wise missing conditions. This presents a significant challenge for conditional generative modeling when the multiple attributes are used jointly as conditions, thereby limiting the model's controllability and applicability. To address this issue, we propose a novel generative approach, Diffusion Model with Double Guidance, which enables precise conditional generation even when no training samples contain all conditions simultaneously. Our method maintains rigorous control over multiple conditions without requiring joint annotations. We demonstrate its effectiveness in molecular and image generation tasks, where it outperforms existing baselines both in alignment with target conditional distributions and in controllability under missing condition settings.
翻译:创建用于训练高性能生成模型的大规模数据集通常成本极高,尤其是在需要提供相关属性或标注的情况下。因此,合并现有数据集已成为一种常见策略。然而,不同数据集之间的属性集通常不一致,简单拼接会导致块状缺失条件。当多个属性被联合用作条件时,这对条件生成建模构成了重大挑战,从而限制了模型的可控性和适用性。为解决这一问题,我们提出了一种新颖的生成方法——基于双重引导的扩散模型,它能够在没有任何训练样本同时包含所有条件的情况下实现精确的条件生成。我们的方法在不需联合标注的情况下,维持对多个条件的严格控制。我们在分子和图像生成任务中证明了其有效性,在目标条件分布对齐度以及缺失条件设置下的可控性方面,均优于现有基线模型。