Kullback-Leibler divergence (KL) regularization is widely used in reinforcement learning, but it becomes infinite under support mismatch and can degenerate in low-noise limits. Utilizing a unified information-geometric framework, we introduce (Kalman)-Wasserstein-based KL analogues by replacing the Fisher-Rao geometry in the dynamical formulation of the KL with transport-based geometries, and we derive closed-form values for common distribution families. These divergences remain finite under support mismatch and yield a geometric interpretation of regularization heuristics used in Kalman ensemble methods. We demonstrate the utility of these divergences in KL-regularized optimal control. In the fully tractable setting of linear time-invariant systems with Gaussian process noise, the classical KL reduces to a quadratic control penalty that becomes singular as process noise vanishes. Our variants remove this singularity, yielding well-posed problems. On a double integrator and a cart-pole example, the resulting controls outperform KL-based regularization.
翻译:Kullback-Leibler散度(KL正则化)在强化学习中被广泛使用,但在支撑集不匹配时会变为无穷大,且在低噪声极限下可能退化。利用统一的信息几何框架,我们通过将KL动态表述中的Fisher-Rao几何替换为基于传输的几何,引入了(Kalman)-Wasserstein基KL类比,并推导了常见分布族的闭式值。这些散度在支撑集不匹配时保持有限,并为Kalman集成方法中使用的正则化启发式提供了几何解释。我们展示了这些散度在KL正则化最优控制中的实用性。在具有高斯过程噪声的线性时不变系统的完全可处理设定中,经典KL退化为二次控制惩罚项,并随着过程噪声消失而变得奇异。我们的变体消除了这一奇异性,从而产生适定问题。在双积分器和倒立摆示例中,所得控制性能优于基于KL的正则化方法。