The variational autoencoder (VAE) typically employs a standard normal prior as a regularizer for the probabilistic latent encoder. However, the Gaussian tail often decays too quickly to effectively accommodate the encoded points, failing to preserve crucial structures hidden in the data. In this paper, we explore the use of heavy-tailed models to combat over-regularization. Drawing upon insights from information geometry, we propose $t^3$VAE, a modified VAE framework that incorporates Student's t-distributions for the prior, encoder, and decoder. This results in a joint model distribution of a power form which we argue can better fit real-world datasets. We derive a new objective by reformulating the evidence lower bound as joint optimization of KL divergence between two statistical manifolds and replacing with $\gamma$-power divergence, a natural alternative for power families. $t^3$VAE demonstrates superior generation of low-density regions when trained on heavy-tailed synthetic data. Furthermore, we show that $t^3$VAE significantly outperforms other models on CelebA and imbalanced CIFAR-100 datasets.
翻译:变分自编码器通常采用标准正态先验作为概率潜在编码器的正则化项。然而,高斯分布的尾部衰减过快,难以有效容纳编码后的数据点,导致无法保留数据中隐藏的关键结构。本文探索使用重尾模型来缓解过正则化问题。基于信息几何的洞见,我们提出了$t^3$VAE——一种改进的变分自编码器框架,该框架将学生t分布用于先验、编码器和解码器。由此产生的联合模型分布具有幂函数形式,我们认为这种形式能更好地拟合真实世界数据集。通过将证据下界重新表述为两个统计流形间KL散度的联合优化,并用幂族的自然替代项——γ幂散度取代传统散度,我们推导出新的目标函数。在重尾合成数据上训练时,$t^3$VAE在低密度区域的生成任务中表现优异。此外,实验表明$t^3$VAE在CelebA和不平衡CIFAR-100数据集上显著优于其他模型。