Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

from arxiv, Changes in v3: Minor corrections and improvements. Plots can be reproduced using the code at https://github.com/dholzmueller/sampling_experiments

Sampling from Gibbs distributions $p(x) \propto \exp(-V(x)/\varepsilon)$ and computing their log-partition function are fundamental tasks in statistics, machine learning, and statistical physics. However, while efficient algorithms are known for convex potentials $V$, the situation is much more difficult in the non-convex case, where algorithms necessarily suffer from the curse of dimensionality in the worst case. For optimization, which can be seen as a low-temperature limit of sampling, it is known that smooth functions $V$ allow faster convergence rates. Specifically, for $m$-times differentiable functions in $d$ dimensions, the optimal rate for algorithms with $n$ function evaluations is known to be $O(n^{-m/d})$, where the constant can potentially depend on $m, d$ and the function to be optimized. Hence, the curse of dimensionality can be alleviated for smooth functions at least in terms of the convergence rate. Recently, it has been shown that similarly fast rates can also be achieved with polynomial runtime $O(n^{3.5})$, where the exponent $3.5$ is independent of $m$ or $d$. Hence, it is natural to ask whether similar rates for sampling and log-partition computation are possible, and whether they can be realized in polynomial time with an exponent independent of $m$ and $d$. We show that the optimal rates for sampling and log-partition computation are sometimes equal and sometimes faster than for optimization. We then analyze various polynomial-time sampling algorithms, including an extension of a recent promising optimization approach, and find that they sometimes exhibit interesting behavior but no near-optimal rates. Our results also give further insights on the relation between sampling, log-partition, and optimization problems.

翻译：从吉布斯分布 $p(x) \propto \exp(-V(x)/\varepsilon)$ 中采样并计算其对数分区函数是统计学、机器学习和统计物理中的基础任务。然而，虽然对于凸势函数 $V$ 存在高效算法，但在非凸情形下，问题则困难得多——算法在最坏情况下难免会遭遇维度灾难。对于可看作采样低温极限的优化问题，已知光滑函数 $V$ 能实现更快的收敛速率。具体而言，对于 $d$ 维空间中 $m$ 次可微的函数，基于 $n$ 次函数评估的算法的最优速率为 $O(n^{-m/d})$，其中常数可能依赖于 $m$、$d$ 及待优化函数。因此，至少在收敛速率上，光滑性可以缓解维度灾难。近期研究表明，类似的高速率也能以多项式时间 $O(n^{3.5})$ 实现，其中指数 $3.5$ 与 $m$ 或 $d$ 无关。这自然引发疑问：采样和对数分区计算能否达到类似速率？这些速率能否在多项式时间内实现且指数与 $m$、$d$ 无关？我们证明，采样和对数分区计算的最优速率有时与优化相同，有时更快。随后分析多种多项式时间采样算法（包括一种近期有前景的优化方法的扩展），发现它们有时表现出有趣的性质，但未能接近最优速率。我们的结果也进一步揭示了采样、对数分区与优化问题之间的关系。