Crossed random effects structures arise in many scientific contexts. They raise severe computational problems with likelihood and Bayesian computations scaling like $N^{3/2}$ or worse for $N$ data points. In this paper we develop a composite likelihood approach for crossed random effects probit models. For data arranged in rows and columns, one likelihood uses marginal distributions of the responses as if they were independent, another uses a hierarchical model capturing all within row dependence as if the rows were independent and the third model reverses the roles of rows and columns. We find that this method has a cost that grows as $\mathrm{O}(N)$ in crossed random effects settings where using the Laplace approximation has cost that grows superlinearly. We show how to get consistent estimates of the probit slope and variance components by maximizing those three likelihoods. The algorithm scales readily to a data set of five million observations from Stitch Fix.
翻译:交叉随机效应结构出现在许多科学领域中。它们带来了严重的计算问题,其中似然和贝叶斯计算对于N个数据点的复杂度为$N^{3/2}$或更差。在本文中,我们针对交叉随机效应Probit模型提出了一种复合似然方法。对于按行和列排列的数据,一种似然使用响应变量的边际分布,如同它们是独立的;另一种使用层次模型捕捉所有行内依赖性,并假设行是独立的;第三种似然则交换行和列的角色。我们发现,该方法的成本在交叉随机效应设置中以$\mathrm{O}(N)$增长,而使用拉普拉斯近似的成本则以超线性方式增长。我们展示了如何通过最大化这三种似然来获得Probit斜率与方差分量的一致估计。该算法可轻松扩展到Stitch Fix提供的五百万观测值数据集。