We study the problem of self-supervised structured representation learning using autoencoders for downstream tasks such as generative modeling. Unlike most methods which rely on matching an arbitrary, relatively unstructured, prior distribution for sampling, we propose a sampling technique that relies solely on the independence of latent variables, thereby avoiding the trade-off between reconstruction quality and generative performance typically observed in VAEs. We design a novel autoencoder architecture capable of learning a structured representation without the need for aggressive regularization. Our structural decoders learn a hierarchy of latent variables, thereby ordering the information without any additional regularization or supervision. We demonstrate how these models learn a representation that improves results in a variety of downstream tasks including generation, disentanglement, and extrapolation using several challenging and natural image datasets.
翻译:我们研究利用自编码器进行自监督结构化表征学习的问题,以服务于生成建模等下游任务。与大多数依赖匹配任意、相对无结构的先验分布进行采样的方法不同,我们提出一种仅依赖于隐变量独立性的采样技术,从而避免了变分自编码器中常见的重构质量与生成性能之间的权衡。我们设计了一种新型自编码器架构,能够在无需强正则化的情况下学习结构化表征。我们的结构化解码器能够学习隐变量的层次结构,从而在无需额外正则化或监督的条件下对信息进行排序。我们展示了这些模型如何学习到能在多种具有挑战性的自然图像数据集上提升生成、解耦和外推等下游任务性能的表征。