Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.
翻译:多元正态分布数据集广泛存在于扩散张量成像、结构张量计算机视觉、雷达信号处理、机器学习等诸多科学领域。为对这些正态数据集进行滤波、分类或聚类等下游任务处理,需要定义正态分布间的相异性度量及连接它们的路径。由Fisher信息度量诱导的黎曼测地距离所定义的Fisher-Rao距离是一种理论严谨的度量距离,但除少数特例外,其闭式解尚未知。本研究首先提出一种快速鲁棒的方法,可对多元正态分布间的Fisher-Rao距离进行任意精度的逼近。其次,我们引入一类基于微分同胚嵌入的距离,将正态流形嵌入到对应中心正态分布流形的高维对称正定锥子流形中。我们证明锥上的射影Hilbert距离可在嵌入的正态子流形上诱导度量,并通过拉回该锥距离及其关联的直线Hilbert锥测地线,获得正态分布间的距离与平滑路径。与Fisher-Rao距离逼近相比,拉回Hilbert锥距离计算轻量,仅需计算矩阵的极小与极大特征值。最后,我们展示了如何在聚类任务中应用这些距离。