We consider \emph{Gibbs distributions}, which are families of probability distributions over a discrete space $\Omega$ with probability mass function of the form $\mu^\Omega_\beta(\omega) \propto e^{\beta H(\omega)}$ for $\beta$ in an interval $[\beta_{\min}, \beta_{\max}]$ and $H( \omega ) \in \{0 \} \cup [1, n]$. The \emph{partition function} is the normalization factor $Z(\beta)=\sum_{\omega \in\Omega}e^{\beta H(\omega)}$. Two important parameters of these distributions are the log partition ratio $q = \log \tfrac{Z(\beta_{\max})}{Z(\beta_{\min})}$ and the counts $c_x = |H^{-1}(x)|$. These are correlated with system parameters in a number of physical applications and sampling algorithms. Our first main result is to estimate the counts $c_x$ using roughly $\tilde O( \frac{q}{\varepsilon^2})$ samples for general Gibbs distributions and $\tilde O( \frac{n^2}{\varepsilon^2} )$ samples for integer-valued distributions (ignoring some second-order terms and parameters), and we show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs and perfect matchings in a graph. We develop a key subroutine to estimate the partition function $Z$. Specifically, it generates a data structure to estimate $Z(\beta)$ for \emph{all} values $\beta$, without further samples. Constructing the data structure requires $O(\frac{q \log n}{\varepsilon^2})$ samples for general Gibbs distributions and $O(\frac{n^2 \log n}{\varepsilon^2} + n \log q)$ samples for integer-valued distributions. This improves over a prior algorithm of Huber (2015) which computes a single point estimate $Z(\beta_\max)$ using $O( q \log n( \log q + \log \log n + \varepsilon^{-2}))$ samples. We show matching lower bounds, demonstrating that this complexity is optimal as a function of $n$ and $q$ up to logarithmic terms.
翻译:我们考虑\emph{吉布斯分布},即定义在离散空间$\Omega$上的一类概率分布族,其概率质量函数形式为$\mu^\Omega_\beta(\omega) \propto e^{\beta H(\omega)}$,其中$\beta$位于区间$[\beta_{\min}, \beta_{\max}]$,且$H( \omega ) \in \{0 \} \cup [1, n]$。\emph{配分函数}是归一化因子$Z(\beta)=\sum_{\omega \in\Omega}e^{\beta H(\omega)}$。这些分布的两个重要参数是对数配分比$q = \log \tfrac{Z(\beta_{\max})}{Z(\beta_{\min})}$和计数$c_x = |H^{-1}(x)|$。在众多物理应用和采样算法中,这些参数与系统特性密切相关。我们的第一个主要结果是对一般吉布斯分布,使用约$\tilde O( \frac{q}{\varepsilon^2})$个样本估计计数$c_x$;对整数值分布,使用约$\tilde O( \frac{n^2}{\varepsilon^2} )$个样本(忽略部分二阶项和参数)。我们证明该复杂度在log因子意义下是最优的。我们通过改进图连通子图计数和完美匹配计数算法展示其应用。我们还开发了一个关键子程序来估计配分函数$Z$。具体而言,该子程序构建了一个数据结构,可在无需额外样本的情况下估计\emph{所有}$\beta$值的$Z(\beta)$。构建该数据结构对一般吉布斯分布需要$O(\frac{q \log n}{\varepsilon^2})$个样本,对整数值分布需要$O(\frac{n^2 \log n}{\varepsilon^2} + n \log q)$个样本。这改进了Huber(2015)的先前算法,后者使用$O( q \log n( \log q + \log \log n + \varepsilon^{-2}))$个样本仅计算单点估计$Z(\beta_\max)$。我们证明了匹配的下界,表明该复杂度作为$n$和$q$的函数在log因子意义下达到最优。