The $β$-VAE is a foundational framework for unsupervised disentanglement, using $β$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $β$ and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel's semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for $β> 1$, stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the $λβ$-VAE, which decouples regularization pressure from informational collapse via an auxiliary $L_2$ reconstruction penalty $λ$. Extensive experiments on dSprites, Shapes3D, and MPI3D-real confirm that $λ> 0$ stabilizes disentanglement and restores latent informativeness over a significantly broader range of $β$, providing a principled theoretical justification for dual-parameter regularization in variational inference backbones.
翻译:β-VAE是无监督解缠学习的基础框架,通过β参数调节潜在因子分解与重构保真度之间的权衡。然而实验表明,解缠性能普遍呈现非单调趋势:MIG和SAP等基准指标通常在中等β值时达到峰值,随后随正则化强度增加而崩溃。我们证明这种崩溃本质上是信息论层面的失效——过强的Kullback-Leibler约束以牺牲潜在通道语义信息量为代价,强制促成了边际独立性。通过在线性高斯场景中形式化该机制,我们证明当β>1时,稳态诱导的动力学会触发编码器增益的谱收缩,驱使潜在因子互信息归零。为解决此问题,我们提出λβ-VAE,通过引入辅助L2重构惩罚项λ将正则化压力与信息坍缩解耦。在dSprites、Shapes3D和MPI3D-real数据集上的大量实验证实,λ>0能稳定解缠过程并在更宽广的β值范围内保持潜在信息量,为变分推断主干网络中的双参数正则化提供了严谨的理论依据。