Discrete latent variables are considered important for real world data, which has motivated research on Variational Autoencoders (VAEs) with discrete latents. However, standard VAE training is not possible in this case, which has motivated different strategies to manipulate discrete distributions in order to train discrete VAEs similarly to conventional ones. Here we ask if it is also possible to keep the discrete nature of the latents fully intact by applying a direct discrete optimization for the encoding model. The approach is consequently strongly diverting from standard VAE-training by sidestepping sampling approximation, reparameterization trick and amortization. Discrete optimization is realized in a variational setting using truncated posteriors in conjunction with evolutionary algorithms. For VAEs with binary latents, we (A) show how such a discrete variational method ties into gradient ascent for network weights, and (B) how the decoder is used to select latent states for training. Conventional amortized training is more efficient and applicable to large neural networks. However, using smaller networks, we here find direct discrete optimization to be efficiently scalable to hundreds of latents. More importantly, we find the effectiveness of direct optimization to be highly competitive in `zero-shot' learning. In contrast to large supervised networks, the here investigated VAEs can, e.g., denoise a single image without previous training on clean data and/or training on large image datasets. More generally, the studied approach shows that training of VAEs is indeed possible without sampling-based approximation and reparameterization, which may be interesting for the analysis of VAE-training in general. For `zero-shot' settings a direct optimization, furthermore, makes VAEs competitive where they have previously been outperformed by non-generative approaches.
翻译:离散潜变量被认为对现实世界数据至关重要,这推动了具有离散潜变量的变分自编码器(VAE)的研究。然而,标准VAE训练在此情况下不可行,这促使人们采用不同策略来操纵离散分布,以便像训练传统VAE一样训练离散VAE。在此,我们探究是否可能通过直接对编码模型应用离散优化,从而完全保持潜变量的离散性质。该方法通过避开采样近似、重参数化技巧和摊销,显著偏离了标准VAE训练方式。离散优化在变分框架中利用截断后验与进化算法实现。对于具有二值潜变量的VAE,我们(A)展示了这种离散变分方法如何与网络权重的梯度上升相结合,以及(B)说明了如何利用解码器选择用于训练的潜状态。传统的摊销训练更高效且适用于大型神经网络。然而,在使用较小网络的情况下,我们发现直接离散优化可高效扩展至数百个潜变量。更重要的是,我们发现在“零样本”学习中,直接优化方法具有高度竞争力。与大型监督网络不同,本文所研究的VAE能够在不依赖干净数据预训练和/或大型图像数据集训练的情况下,对单张图像进行去噪。更广泛而言,该方法表明VAE训练确实可以在无需基于采样的近似和重参数化的情况下实现,这可能对一般VAE训练的分析具有价值。此外,在“零样本”场景中,直接优化使VAE在以往被非生成方法超越的领域具有了竞争力。