The crossed random effects model is widely used, finding applications in various fields such as longitudinal studies, e-commerce, and recommender systems, among others. However, these models encounter scalability challenges, as the computational time for standard algorithms grows superlinearly with the number N of observations in the data set, commonly $\Omega(N^{3/2})$ or worse. Recent work has developed scalable methods for crossed random effects in linear models and some generalized linear models, but those works only allow for random intercepts. In this paper we devise scalable algorithms for models that include random slopes. This problem brings a substantial difficulty in estimating the random effect covariance matrices in a scalable way. We address that issue by using a variational EM algorithm. In simulations, we see that the proposed method is faster than standard methods. It is also more efficient than ordinary least squares which also has a problem of greatly underestimating the sampling uncertainty in parameter estimates. We illustrate the new method on a large dataset (five million observations) from the online retailer Stitch Fix.
翻译:交叉随机效应模型广泛应用于纵向研究、电子商务及推荐系统等多个领域。然而,这类模型面临可扩展性挑战,标准算法的计算时间随数据集观测数N超线性增长,通常为$\Omega(N^{3/2})$或更差。近期研究为线性模型及部分广义线性模型开发了可扩展的交叉随机效应方法,但这些工作仅允许随机截距项。本文针对包含随机斜率的模型设计了可扩展算法。该问题在可扩展估计随机效应协方差矩阵方面带来重大困难。我们通过变分EM算法解决此问题。仿真实验表明,所提方法比标准方法计算速度更快,且比普通最小二乘法更有效——后者还存在严重低估参数估计抽样不确定性的问题。我们通过线上零售商Stitch Fix的大规模数据(五百万观测值)验证了新方法的有效性。