We consider the problem of making nonparametric inference in multi-dimensional diffusion models from low-frequency data. Statistical analysis in this setting is notoriously challenging due to the intractability of the likelihood and its gradient, and computational methods have thus far largely resorted to expensive simulation-based techniques. In this article, we propose a new computational approach which is motivated by PDE theory and is built around the characterisation of the transition densities as solutions of the associated heat (Fokker-Planck) equation. Employing optimal regularity results from the theory of parabolic PDEs, we prove a novel characterisation for the gradient of the likelihood. Using these developments, for the nonlinear inverse problem of recovering the diffusivity (in divergence form models), we then show that the numerical evaluation of the likelihood and its gradient can be reduced to standard elliptic eigenvalue problems, solvable by powerful finite element methods. This enables the efficient implementation of a large class of statistical algorithms, including (i) preconditioned Crank-Nicolson and Langevin-type methods for posterior sampling, and (ii) gradient-based descent optimisation schemes to compute maximum likelihood and maximum-a-posteriori estimates. We showcase the effectiveness of these methods via extensive simulation studies in a nonparametric Bayesian model with Gaussian process priors. Interestingly, the optimisation schemes provided satisfactory numerical recovery while exhibiting rapid convergence towards stationary points despite the problem nonlinearity; thus our approach may lead to significant computational speed-ups. The reproducible code is available online at https://github.com/MattGiord/LF-Diffusion.
翻译:本文考虑从低频数据中对多维扩散模型进行非参数推断的问题。由于似然函数及其梯度的难解性,该场景下的统计分析极具挑战性,现有计算方法大多依赖于昂贵的模拟技术。本文提出一种基于偏微分方程理论的新计算方法,该方法以转移密度作为相关热方程(福克-普朗克方程)解的特征为基础。利用抛物型偏微分方程理论中的最优正则性结果,我们证明了似然函数梯度的一个新表征。基于这些发展,针对恢复扩散率(散度形式模型)的非线性反问题,我们进一步证明似然函数及其梯度的数值评估可简化为标准椭圆型特征值问题,并能通过强有限元方法求解。这使得大规模统计算法得以高效实现,包括:(i) 用于后验抽样的预条件克兰克-尼科尔森方法和朗之万型方法;(ii) 用于计算最大似然估计和最大后验估计的基于梯度下降的优化方案。我们通过采用高斯过程先验的非参数贝叶斯模型的广泛模拟研究,展示了这些方法的有效性。有趣的是,优化方案在非线性问题中不仅实现了令人满意的数值恢复,还表现出快速收敛到驻点的特性;因此我们的方法可能带来显著的计算加速。可复现代码见 https://github.com/MattGiord/LF-Diffusion。