Parameter estimation for integer-valued Gibbs distributions

A central problem in computational statistics is to convert a procedure for sampling combinatorial from an objects into a procedure for counting those objects, and vice versa. Weconsider sampling problems coming from *Gibbs distributions*, which are probability distributions of the form $\mu^\Omega_\beta(\omega) \propto e^{\beta H(\omega)}$ for $\beta$ in an interval $[\beta_\min, \beta_\max]$ and $H( \omega ) \in \{0 \} \cup [1, n]$. The *partition function* is the normalization factor $Z(\beta)=\sum_{\omega \in\Omega}e^{\beta H(\omega)}$. Two important parameters are the log partition ratio $q = \log \tfrac{Z(\beta_\max)}{Z(\beta_\min)}$ and the vector of counts $c_x = |H^{-1}(x)|$. Our first result is an algorithm to estimate the counts $c_x$ using roughly $\tilde O( \frac{q}{\epsilon^2})$ samples for general Gibbs distributions and $\tilde O( \frac{n^2}{\epsilon^2} )$ samples for integer-valued distributions (ignoring some second-order terms and parameters). We show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs and perfect matchings in a graph. We develop a key subroutine for global estimation of the partition function. Specifically, we produce a data structure to estimate $Z(\beta)$ for \emph{all} values $\beta$, without further samples. Constructing the data structure requires $O(\frac{q \log n}{\epsilon^2})$ samples for general Gibbs distributions and $O(\frac{n^2 \log n}{\epsilon^2} + n \log q)$ samples for integer-valued distributions. This improves over a prior algorithm of Kolmogorov (2018) which computes the single point estimate $Z(\beta_\max)$ using $\tilde O(\frac{q}{\epsilon^2})$ samples. We also show that this complexity is optimal as a function of $n$ and $q$ up to logarithmic terms.

翻译：计算统计学中的一个核心问题是将组合对象的抽样过程转化为计数过程，反之亦然。本文考虑来自*吉布斯分布*的抽样问题，该分布形如$\mu^\Omega_\beta(\omega) \propto e^{\beta H(\omega)}$，其中$\beta$属于区间$[\beta_\min, \beta_\max]$，且$H( \omega ) \in \{0 \} \cup [1, n]$。*配分函数*是归一化因子$Z(\beta)=\sum_{\omega \in\Omega}e^{\beta H(\omega)}$。两个重要参数是对数配分比$q = \log \tfrac{Z(\beta_\max)}{Z(\beta_\min)}$和计数向量$c_x = |H^{-1}(x)|$。我们的第一个结果是估计计数$c_x$的算法，对于一般吉布斯分布需约$\tilde O( \frac{q}{\epsilon^2})$个样本，对于整数值分布需约$\tilde O( \frac{n^2}{\epsilon^2} )$个样本（忽略某些二阶项和参数）。我们证明该复杂度在忽略对数因子情况下是最优的。我们通过改进图中连通子图和完美匹配计数的算法来阐述这一成果。我们开发了一个用于全局估计配分函数的关键子程序。具体而言，我们构建了一个数据结构，可在无需额外样本的情况下估计\emph{所有}$\beta$值的$Z(\beta)$。构建该数据结构对于一般吉布斯分布需要$O(\frac{q \log n}{\epsilon^2})$个样本，对于整数值分布需要$O(\frac{n^2 \log n}{\epsilon^2} + n \log q)$个样本。这改进了Kolmogorov (2018)的先前算法，该算法使用$\tilde O(\frac{q}{\epsilon^2})$个样本计算单点估计$Z(\beta_\max)$。我们还证明，在忽略对数项的情况下，该复杂度关于$n$和$q$是最优的。