Choosing the Fisher information as the metric tensor for a Riemannian manifold provides a powerful yet fundamental way to understand statistical distribution families. Distances along this manifold become a compelling measure of statistical distance, and paths of shorter distance improve sampling techniques that leverage a sequence of distributions in their operation. Unfortunately, even for a distribution as generally tractable as the multivariate normal distribution, this information geometry proves unwieldy enough that closed-form solutions for shortest-distance paths or their lengths remain unavailable outside of limited special cases. In this review we present for general statisticians the most practical aspects of the Fisher geometry for this fundamental distribution family. Rather than a differential geometric treatment, we use an intuitive understanding of the covariance-induced curvature of this manifold to unify the special cases with known closed-form solution and review approximate solutions for the general case. We also use the multivariate normal information geometry to better understand the paths or distances commonly used in statistics (annealing, Wasserstein). Given the unavailability of a general solution, we also discuss the methods used for numerically obtaining geodesics in the space of multivariate normals, identifying remaining challenges and suggesting methodological improvements.
翻译:以Fisher信息为Riemann流形的度量张量,为理解统计分布族提供了基础而强大的方法。该流形上的距离成为统计距离的有力度量,而较短距离的路径能够改进依赖分布序列进行采样的技术。然而,即使对于多元正态分布这类通常易于处理的分布,其信息几何仍足够复杂,以至于除有限特殊情况外,最短路径或其长度的闭式解仍无法获得。本文从一般统计学家视角,系统呈现该基础分布族Fisher几何最实用的方面。我们摒弃微分几何处理方式,转而通过直观理解协方差诱导的流形曲率,统一已知闭式解的特殊情形,并综述一般情形的近似解。同时,利用多元正态信息几何深入理解统计学中常用路径或距离(退火路径、Wasserstein距离)。鉴于通用解法的缺失,我们还讨论了数值求解多元正态空间测地线的方法,指出现存挑战并提出方法论改进建议。