In this study, we examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation. Our philosophy is to deconstruct a DDM, gradually transforming it into a classical Denoising Autoencoder (DAE). This deconstructive procedure allows us to explore how various components of modern DDMs influence self-supervised representation learning. We observe that only a very few modern components are critical for learning good representations, while many others are nonessential. Our study ultimately arrives at an approach that is highly simplified and to a large extent resembles a classical DAE. We hope our study will rekindle interest in a family of classical methods within the realm of modern self-supervised learning.
翻译:在本研究中,我们考察了最初用于图像生成的去噪扩散模型(DDM)的表征学习能力。我们的核心理念是解构DDM,将其逐步转化为经典的去噪自编码器(DAE)。这一解构过程使我们能够探究现代DDM的各个组件如何影响自监督表征学习。我们观察到,仅有少数现代组件对学习优质表征至关重要,而其他许多组件则并非必要。我们的研究最终得出了一个高度简化的方法,该方法在很大程度上类似于经典的DAE。我们希望这项研究能够重新激发学术界对现代自监督学习领域中一系列经典方法的兴趣。