Learning rich data representations from unlabeled data is a key challenge towards applying deep learning algorithms in downstream supervised tasks. Several variants of variational autoencoders have been proposed to learn compact data representaitons by encoding high-dimensional data in a lower dimensional space. Two main classes of VAEs methods may be distinguished depending on the characteristics of the meta-priors that are enforced in the representation learning step. The first class of methods derives a continuous encoding by assuming a static prior distribution in the latent space. The second class of methods learns instead a discrete latent representation using vector quantization (VQ) along with a codebook. However, both classes of methods suffer from certain challenges, which may lead to suboptimal image reconstruction results. The first class of methods suffers from posterior collapse, whereas the second class of methods suffers from codebook collapse. To address these challenges, we introduce a new VAE variant, termed SC-VAE (sparse coding-based VAE), which integrates sparse coding within variational autoencoder framework. Instead of learning a continuous or discrete latent representation, the proposed method learns a sparse data representation that consists of a linear combination of a small number of learned atoms. The sparse coding problem is solved using a learnable version of the iterative shrinkage thresholding algorithm (ISTA). Experiments on two image datasets demonstrate that our model can achieve improved image reconstruction results compared to state-of-the-art methods. Moreover, the use of learned sparse code vectors allows us to perform downstream task like coarse image segmentation through clustering image patches.
翻译:从无标签数据中学习丰富的数据表示是将深度学习算法应用于下游监督任务的关键挑战。已有多种变分自编码器的变体被提出,通过将高维数据编码至低维空间来学习紧凑的数据表示。根据表示学习步骤中所施加的元先验特征,可将VAE方法主要分为两类。第一类方法通过在潜空间中假设静态先验分布来推导连续编码;第二类方法则利用向量量化(VQ)结合码本学习离散的潜表示。然而,这两类方法均面临某些挑战,可能导致次优的图像重建结果。第一类方法存在后验坍塌问题,而第二类方法则存在码本坍塌问题。为应对这些挑战,我们提出一种新的VAE变体,称为SC-VAE(基于稀疏编码的VAE),其将稀疏编码集成至变分自编码器框架中。所提方法不学习连续或离散的潜表示,而是学习由少量已学习原子的线性组合构成的稀疏数据表示。稀疏编码问题通过可学习的迭代收缩阈值算法(ISTA)求解。在两个图像数据集上的实验表明,与现有最先进方法相比,我们的模型可实现更优的图像重建结果。此外,所学习的稀疏编码向量使我们能够执行下游任务,例如通过聚类图像块进行粗粒度图像分割。