Variational Autoencoders (VAEs) have gained significant popularity among researchers as a powerful tool for understanding unknown distributions based on limited samples. This popularity stems partly from their impressive performance and partly from their ability to provide meaningful feature representations in the latent space. Wasserstein Autoencoders (WAEs), a variant of VAEs, aim to not only improve model efficiency but also interpretability. However, there has been limited focus on analyzing their statistical guarantees. The matter is further complicated by the fact that the data distributions to which WAEs are applied - such as natural images - are often presumed to possess an underlying low-dimensional structure within a high-dimensional feature space, which current theory does not adequately account for, rendering known bounds inefficient. To bridge the gap between the theory and practice of WAEs, in this paper, we show that WAEs can learn the data distributions when the network architectures are properly chosen. We show that the convergence rates of the expected excess risk in the number of samples for WAEs are independent of the high feature dimension, instead relying only on the intrinsic dimension of the data distribution.
翻译:变分自编码器(VAEs)作为基于有限样本理解未知分布的强大工具,已获得研究人员的广泛关注。这一流行度部分源于其卓越的性能表现,部分源于其在隐空间中提供有意义的特征表示的能力。作为VAEs的变体,Wasserstein自编码器(WAEs)旨在同时提升模型效率与可解释性。然而,目前对其统计保证的分析研究十分有限。更复杂的是,WAEs所应用的数据分布(如自然图像)通常被认为在高维特征空间中具有潜在的低维结构,而现有理论未能充分解释这一特性,导致已知界失效。为弥合WAE理论与实践的差距,本文证明:在合理选择网络架构的条件下,WAEs能够学习数据分布。我们证明,WAEs的样本期望超额风险收敛率与高维特征维度无关,仅依赖于数据分布的本质维度。