Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.
翻译:多元正态分布数据集广泛应用于扩散张量成像、结构张量计算机视觉、雷达信号处理、机器学习等众多科学领域。为对这类正态数据执行滤波、分类或聚类等下游任务,需要定义正态分布间恰当的差异度量及其路径。费希尔-拉奥距离定义为基于费希尔信息度量的黎曼测地线距离,是一种理论完备的度量距离,但除少数特例外通常无闭式解。本文首先提出一种快速鲁棒的方法来任意精细地近似多元正态分布间的费希尔-拉奥距离。其次,我们引入基于微分同胚嵌入的一类距离,将正态流形嵌入到更高维对称正定锥的子流形(对应于中心化正态分布流形)中。我们证明锥上的射影希尔伯特距离在嵌入的正态子流形上诱导出度量,并通过拉回该锥距离及其关联的直线型希尔伯特锥测地线,得到正态分布间的距离和平滑路径。与费希尔-拉奥距离近似相比,拉回希尔伯特锥距离计算量小,仅需计算矩阵的极值最小和最大特征值。最后,我们展示如何将这些距离应用于聚类任务中。