Crossed random effects structures arise in many scientific contexts. They raise severe computational problems with likelihood computations scaling like $N^{3/2}$ or worse for $N$ data points. In this paper we develop a new composite likelihood approach for crossed random effects probit models. For data arranged in R rows and C columns, the likelihood function includes a very difficult R + C dimensional integral. The composite likelihood we develop uses the marginal distribution of the response along with two hierarchical models. The cost is reduced to $\mathcal{O}(N)$ and it can be computed with $R + C$ one dimensional integrals. We find that the commonly used Laplace approximation has a cost that grows superlinearly. We get consistent estimates of the probit slope and variance components from our composite likelihood algorithm. We also show how to estimate the covariance of the estimated regression coefficients. The algorithm scales readily to a data set of five million observations from Stitch Fix with $R + C > 700{,}000$.
翻译:交叉随机效应结构广泛存在于众多科学领域。这类结构在似然计算中引发严重的计算问题,其计算复杂度随数据点数量$N$按$N^{3/2}$甚至更糟的速率增长。本文针对交叉随机效应Probit模型提出了一种新的复合似然方法。对于按R行和C列排列的数据,其似然函数包含一个极其困难的R + C维积分。本文构建的复合似然利用响应的边际分布及两个分层模型,将计算成本降至$\mathcal{O}(N)$,且仅需计算$R + C$个一维积分。研究发现常用的拉普拉斯近似方法具有超线性增长的计算成本。通过该复合似然算法,我们获得了Probit斜率与方差分量的一致估计,并展示了如何估计回归系数估计量的协方差矩阵。该算法可轻松扩展至Stitch Fix数据集中五百万条观测($R + C > 700{,}000$)的规模。