Disentangled representation learning (DRL) aims to break down observed data into core intrinsic factors for a profound understanding of the data. In real-world scenarios, manually defining and labeling these factors are non-trivial, making unsupervised methods attractive. Recently, there have been limited explorations of utilizing diffusion models (DMs), which are already mainstream in generative modeling, for unsupervised DRL. They implement their own inductive bias to ensure that each latent unit input to the DM expresses only one distinct factor. In this context, we design Dynamic Gaussian Anchoring to enforce attribute-separated latent units for more interpretable DRL. This unconventional inductive bias explicitly delineates the decision boundaries between attributes while also promoting the independence among latent units. Additionally, we also propose Skip Dropout technique, which easily modifies the denoising U-Net to be more DRL-friendly, addressing its uncooperative nature with the disentangling feature extractor. Our methods, which carefully consider the latent unit semantics and the distinct DM structure, enhance the practicality of DM-based disentangled representations, demonstrating state-of-the-art disentanglement performance on both synthetic and real data, as well as advantages in downstream tasks.
翻译:解缠表示学习旨在将观测数据分解为核心内在因子,以实现对数据的深入理解。在实际场景中,手动定义和标注这些因子具有挑战性,因此无监督方法备受关注。近期,利用已在生成建模中成为主流的扩散模型进行无监督解缠表示学习的研究探索有限。这些方法通过引入自身的归纳偏置,确保输入扩散模型的每个潜在单元仅表达单一独立因子。在此背景下,我们设计了动态高斯锚定方法,通过强制属性分离的潜在单元来实现更具可解释性的解缠表示学习。这种非常规的归纳偏置不仅明确划分了属性间的决策边界,同时促进了潜在单元间的独立性。此外,我们还提出了跳跃丢弃技术,通过简单修改去噪U-Net结构使其更适配解缠表示学习,解决了其与解缠特征提取器不兼容的问题。我们的方法通过细致考量潜在单元语义与扩散模型特有结构,提升了基于扩散模型的解缠表示的实际应用价值,在合成数据与真实数据上均展现出最先进的解缠性能,并在下游任务中表现出显著优势。