Variational autoencoders (VAEs) face a notorious problem wherein the variational posterior often aligns closely with the prior, a phenomenon known as posterior collapse, which hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter $\beta$ and a strategy for annealing this parameter, called KL annealing, are proposed. This study presents a theoretical analysis of the learning dynamics in a minimal VAE. It is rigorously proved that the dynamics converge to a deterministic process within the limit of large input dimensions, thereby enabling a detailed dynamical analysis of the generalization error. Furthermore, the analysis shows that the VAE initially learns entangled representations and gradually acquires disentangled representations. A fixed-point analysis of the deterministic process reveals that when $\beta$ exceeds a certain threshold, posterior collapse becomes inevitable regardless of the learning period. Additionally, the superfluous latent variables for the data-generative factors lead to overfitting of the background noise; this adversely affects both generalization and learning convergence. The analysis further unveiled that appropriately tuned KL annealing can accelerate convergence.
翻译:变分自编码器面临一个著名的难题,即变分后验分布常与先验分布高度重合,这种现象被称为后验坍塌,严重阻碍了表示学习的质量。为缓解该问题,研究者提出了可调超参数β及相应的退火策略(KL退火)。本研究对最小化VAE的学习动态进行了理论分析。严格证明了在大输入维度极限下,学习动态会收敛至确定性过程,从而实现了对泛化误差的详细动态分析。进一步分析表明,VAE先学习纠缠表示,再逐步获得解纠缠表示。确定性过程的定点分析揭示,当β超过特定阈值时,无论学习周期多长,后验坍塌均不可避免。此外,针对数据生成因子的冗余隐变量会导致对背景噪声的过拟合,这将对泛化性能和学习收敛性产生不利影响。分析还揭示,适当调优的KL退火可加速收敛过程。