Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.
翻译:在许多科学领域,如扩散张量成像、结构张量计算机视觉、雷达信号处理、机器学习等,多元正态分布数据集比比皆是。为了对这些正态数据集进行滤波、分类或聚类等下游任务处理,需要定义正态分布之间适当的差异度量以及连接它们的路径。Fisher-Rao距离定义为由Fisher信息度量诱导的黎曼测地距离,是一种理论完备的度量距离,但除少数特例外,通常没有闭式解。本文首先提出一种快速且鲁棒的方法,可任意精细地近似多元正态分布之间的Fisher-Rao距离。其次,我们引入一类基于微分同胚嵌入的距离,将正态流形嵌入到更高维对称正定锥的子流形中(该锥对应于中心化正态分布流形)。我们证明了锥上的射影Hilbert距离在嵌入的正态子流形上能导出度量,并通过拉回该锥距离及其关联的直线Hilbert锥测地线,得到了正态分布之间的距离与光滑路径。与Fisher-Rao距离近似相比,拉回Hilbert锥距离计算量小,仅需计算矩阵的极值最小和最大特征值。最后,我们展示了如何将这些距离应用于聚类任务。