High-dimensional datasets often concentrate near low-dimensional structures, but estimating their geometry from samples typically relies on graphs and kernels that scale poorly with dataset size and dimension. We propose Riemannian metric matching: a denoising probabilistic framework for learning the Riemannian geometry of data using neural networks. Specifically, we learn the carré du champ operator, which, using diffusion geometry, gives us access to the Riemannian geometry toolkit for downstream machine learning and statistical tasks. Our key observation is that the carré du champ operator can be formulated as a conditional expectation over random perturbations of the data, which can be exploited for sample-wise training and constant cost, amortized inference without explicit kernel construction. Empirically, metric matching rivals or improves the accuracy of $k$-NN-based diffusion geometry estimators, while enabling amortized inference that is up to $400\times$ faster, and supports graph-free geometric analysis on high-dimensional images where nearest neighbors break down.
翻译:高维数据集通常集中在低维结构附近,但通过样本估计其几何结构通常依赖于图和核函数,这些方法在数据集规模和维度增长时扩展性不佳。我们提出黎曼度量匹配:一种利用神经网络学习数据黎曼几何的去噪概率框架。具体而言,我们学习carré du champ算子,该算子借助扩散几何,为下游机器学习和统计任务提供黎曼几何工具。我们的关键发现是,carré du champ算子可以表述为数据随机扰动下的条件期望,这可用于样本级训练和恒定开销的摊销推断,无需显式构建核函数。实验表明,度量匹配在精度上媲美甚至优于基于$k$-NN的扩散几何估计方法,同时支持快达$400$倍的摊销推断,并在最近邻方法失效的高维图像中实现无图几何分析。