The Poisson log-normal model is a latent variable model that provides a generic framework for the analysis of multivariate count data. Inferring its parameters can be a daunting task since the conditional distribution of the latent variables given the observed ones is intractable. For this model, variational approaches are the golden standard solution as they prove to be computationally efficient but lack theoretical guarantees on the estimates. Sampling-based solutions are quite the opposite. We first define a Monte Carlo EM algorithm that can achieve maximum likelihood estimators, but that is computationally efficient only for low-dimensional latent spaces. We then propose a novel inference procedure combining the EM framework with composite likelihood and importance sampling estimates. The algorithm preserves the desirable asymptotic properties of maximum likelihood estimators while circumventing the high-dimensional integration bottleneck, thus maintaining computational feasibility for moderately large datasets. This approach enables grounded parameter estimation, confidence intervals, and hypothesis testing. Application to the Barents Sea fish dataset demonstrates the algorithm capacity to identify significant environmental effects and residual interspecies correlations.
翻译:泊松对数正态模型是一种潜变量模型,为多元计数数据的分析提供了通用框架。由于其潜变量在观测变量条件下的条件分布难以处理,推断其参数可能是一项艰巨的任务。对于该模型,变分方法是当前的主流解决方案,因其计算效率较高,但缺乏对估计值的理论保证。基于采样的方法则恰恰相反。我们首先定义了一种蒙特卡洛EM算法,该算法能够实现最大似然估计,但仅对低维潜空间保持计算效率。随后,我们提出了一种新颖的推断方法,将EM框架与复合似然及重要性采样估计相结合。该算法在保持最大似然估计良好渐近性质的同时,规避了高维积分瓶颈,从而在中等规模数据集上维持了计算可行性。此方法支持可靠的参数估计、置信区间构建和假设检验。在巴伦支海鱼类数据集上的应用表明,该算法能够有效识别显著的环境效应和残差种间相关性。