Exponential families encompass the distributions central to modern machine learning -- softmax, Gaussians, and Boltzmann distributions -- and underlie the theory of variational inference, entropy-regularized reinforcement learning, and RLHF. We isolate a simple identity for exponential families that expresses the KL difference $\mathrm{KL}(q \| p_{λ_2}) - \mathrm{KL}(q \| p_{λ_1})$ in terms of the log-partition function $A(λ)$ and the moment $μ_q$. Remarkably, this identity together with the single fact that $\mathrm{KL} \geq 0$ (with equality iff $p = q$) suffices, by direct substitution and rearrangement, to derive a cluster of results that are classically obtained by separate, heavier arguments: a generalized three-point identity for arbitrary reference distributions, Pythagorean theorems for I-projections and reverse I-projections, convexity of the log-partition function, identification of its Legendre dual in KL terms, the Gibbs variational principle, and the explicit optimizer in KL-regularized reward maximization, including the exponential tilting formula underlying entropy-regularized control and RLHF. Beyond these purely algebraic consequences, standard analytic arguments recover the gradient formula for the log-partition function, the Bregman representation of within-family KL divergence, and the surjectivity of the moment map. The note is self-contained.
翻译:指数族涵盖了现代机器学习中核心的分布——softmax、高斯分布和玻尔兹曼分布——并构成了变分推断、熵正则化强化学习以及RLHF的理论基础。我们提炼出指数族的一个简单恒等式,该恒等式将KL散度差$\mathrm{KL}(q \| p_{λ_2}) - \mathrm{KL}(q \| p_{λ_1})$用对数配分函数$A(λ)$和矩$μ_q$表示。值得注意的是,该恒等式结合$\mathrm{KL} \geq 0$(当且仅当$p = q$时取等号)这一单一事实,通过直接代入和重排,即可推导出一系列通常需要采用各自繁复论证方法才能得到的结果:针对任意参考分布的广义三点恒等式、I投影和逆I投影的勾股定理、对数配分函数的凸性、其勒让德对偶在KL框架下的辨识、吉布斯变分原理,以及KL正则化奖励最大化问题(包括熵正则化控制和RLHF背后的指数倾斜公式)的显式优化解。除这些纯代数推论外,标准分析论证方法还可恢复对数配分函数的梯度公式、族内KL散度的Bregman表示以及矩映射的满射性。本文自成体系。