Although Hamiltonian Monte Carlo (HMC) scales as O(d^(1/4)) in dimension, there is a large constant factor determined by the curvature of the target density. This constant factor can be reduced in most cases through preconditioning, the state of the art for which uses diagonal or dense penalized maximum likelihood estimation of (co)variance based on a sample of warmup draws. These estimates converge slowly in the diagonal case and scale poorly when expanded to the dense case. We propose a more effective estimator based on minimizing the sample Fisher divergence from a linearly transformed density to a standard normal distribution. We present this estimator in three forms, (a) diagonal, (b) dense, and (c) low-rank plus diagonal. Using a collection of 114 models from posteriordb, we demonstrate that the diagonal minimizer of Fisher divergence outperforms the industry-standard variance-based diagonal estimators used by Stan and PyMC by a median factor of 1.3. The low-rank plus diagonal minimizer of the Fisher divergence outperforms Stan and PyMC's diagonal estimators by a median factor of 4.
翻译:尽管哈密顿蒙特卡洛(HMC)算法在维度上具有O(d^(1/4))的缩放性质,但其收敛速度受目标密度曲率决定的较大常数因子的制约。通过预条件技术可降低该常数因子,当前最优方法基于预热采样阶段获得的样本,采用对角化或稠密化罚最大似然估计(协)方差矩阵。这类估计在对角化情形下收敛缓慢,而拓展至稠密情形时计算代价高昂。我们提出一种更高效的估计器,通过最小化线性变换后密度与标准正态分布之间的样本Fisher散度来实现。该估计器具有三种形式:(a)对角化、(b)稠密化及(c)低秩加对角化。基于posteriordb数据库中114个模型的实验表明,最小化Fisher散度的对角化估计器在性能上以中位数1.3倍的优势胜过了Stan与PyMC采用的行业标准方差对角化估计方法。而低秩加对角化型Fisher散度最小化估计器相较于Stan与PyMC的对角化估计器,性能提升中位数可达4倍。