Motivated by the manifold hypothesis, which states that data with a high extrinsic dimension may yet have a low intrinsic dimension, we develop refined statistical bounds for entropic optimal transport that are sensitive to the intrinsic dimension of the data. Our bounds involve a robust notion of intrinsic dimension, measured at only a single distance scale depending on the regularization parameter, and show that it is only the minimum of these single-scale intrinsic dimensions which governs the rate of convergence. We call this the Minimum Intrinsic Dimension scaling (MID scaling) phenomenon, and establish MID scaling with no assumptions on the data distributions so long as the cost is bounded and Lipschitz, and for various entropic optimal transport quantities beyond just values, with stronger analogs when one distribution is supported on a manifold. Our results significantly advance the theoretical state of the art by showing that MID scaling is a generic phenomenon, and provide the first rigorous interpretation of the statistical effect of entropic regularization as a distance scale.
翻译:受流形假设(即高外推维度的数据可能具有低本征维度)的启发,我们针对熵最优输运发展了精细化的统计边界,这些边界对数据的本征维度敏感。我们的边界涉及一种稳健的本征维度概念,该概念仅在一个依赖于正则化参数的距离尺度上测量,并表明正是这些单尺度本征维度中的最小值决定了收敛速率。我们称这种现象为最小本征维度标度(MID标度),并在代价有界且Lipschitz连续的条件下,无需对数据分布作任何假设即建立了MID标度;同时,对于熵最优输运中除数值以外的多种量,当其中一个分布支撑于流形上时,还可得到更强的类似结论。我们的结果显著推进了现有理论水平,揭示了MID标度是一种普适现象,并为熵正则化的统计效应作为距离尺度提供了首个严格解释。