A composite likelihood is an inference function derived by multiplying a set of likelihood components. This approach provides a flexible framework for drawing inference when the likelihood function of a statistical model is computationally intractable. While composite likelihood has computational advantages, it can still be demanding when dealing with numerous likelihood components and a large sample size. This paper tackles this challenge by employing an approximation of the conventional composite likelihood estimator, which is derived from an optimization procedure relying on stochastic gradients. This novel estimator is shown to be asymptotically normally distributed around the true parameter. In particular, based on the relative divergent rate of the sample size and the number of iterations of the optimization, the variance of the limiting distribution is shown to compound for two sources of uncertainty: the sampling variability of the data and the optimization noise, with the latter depending on the sampling distribution used to construct the stochastic gradients. The advantages of the proposed framework are illustrated through simulation studies on two working examples: an Ising model for binary data and a gamma frailty model for count data. Finally, a real-data application is presented, showing its effectiveness in a large-scale mental health survey.
翻译:复合似然是通过将一组似然分量相乘而推导出的推断函数。当统计模型的似然函数在计算上难以处理时,该方法为进行统计推断提供了一个灵活的框架。尽管复合似然具有计算优势,但在处理大量似然分量和大样本量时,其计算量仍然可能非常庞大。本文通过采用一种对传统复合似然估计量的近似来解决这一挑战,该估计量源自一个依赖于随机梯度的优化过程。研究表明,这一新颖的估计量围绕真实参数渐近服从正态分布。具体而言,基于样本量与优化迭代次数的相对发散速率,极限分布的方差被证明由两个不确定性来源复合而成:数据的抽样变异性和优化噪声,其中后者取决于用于构建随机梯度的抽样分布。通过两个工作实例的模拟研究,阐明了所提出框架的优势:一个用于二进制数据的Ising模型和一个用于计数数据的伽玛脆弱模型。最后,通过一个真实数据应用,展示了其在大规模心理健康调查中的有效性。