Motivated by the manifold hypothesis, which states that data with a high extrinsic dimension may yet have a low intrinsic dimension, we develop refined statistical bounds for entropic optimal transport that are sensitive to the intrinsic dimension of the data. Our bounds involve a robust notion of intrinsic dimension, measured at only a single distance scale depending on the regularization parameter, and show that it is only the minimum of these single-scale intrinsic dimensions which governs the rate of convergence. We call this the Minimum Intrinsic Dimension scaling (MID scaling) phenomenon, and establish MID scaling with no assumptions on the data distributions so long as the cost is bounded and Lipschitz, and for various entropic optimal transport quantities beyond just values, with stronger analogs when one distribution is supported on a manifold. Our results significantly advance the theoretical state of the art by showing that MID scaling is a generic phenomenon, and provide the first rigorous interpretation of the statistical effect of entropic regularization as a distance scale.
翻译:受流形假设(即高外在维度的数据可能具有低内在维度)的启发,我们针对熵最优输运问题发展了对数据内在维度敏感的精细化统计界。我们的界采用了内在维度的鲁棒概念,该概念仅在依赖于正则化参数的单一距离尺度下测量,并表明正是这些单尺度内在维度的最小值主导了收敛速率。我们将此称为最小内在维度缩放现象,并证明在代价函数有界且利普希茨连续的情况下,无需对数据分布作任何假设即可建立该缩放现象;同时,对于熵最优输运的多种量(不仅限于值函数),当其中一个分布支撑在流形上时,可得到更强的类似结果。我们的结果显著推进了理论前沿,表明最小内在维度缩放是一种普适现象,并首次严格阐释了熵正则化作为距离尺度的统计效应。