Kullback-Leibler (KL) divergence is a fundamental concept in information theory that quantifies the discrepancy between two probability distributions. In the context of Variational Autoencoders (VAEs), it serves as a central regularization term, imposing structure on the latent space and thereby enabling the model to exhibit generative capabilities. In this work, we present a detailed derivation of the closed-form expression for the KL divergence between Gaussian distributions, a case of particular importance in practical VAE implementations. Starting from the general definition for continuous random variables, we derive the expression for the univariate case and extend it to the multivariate setting under the assumption of diagonal covariance. Finally, we discuss the interpretation of each term in the resulting expression and its impact on the training dynamics of the model.
翻译:Kullback-Leibler (KL) 散度是信息论中的基础概念,用于量化两个概率分布之间的差异。在变分自编码器(VAE)的背景下,它作为关键的正则化项,对潜在空间施加结构约束,从而使模型具备生成能力。本文详细推导了高斯分布之间KL散度的闭式表达式——这是实际VAE实现中具有特殊重要性的情形。我们从连续随机变量的通用定义出发,推导单变量情况下的表达式,并在对角协方差假设下将其扩展至多变量情形。最后,我们讨论所得表达式中每一项的含义及其对模型训练动力学的影响。