Scalable solution to crossed random effects model with random slopes

The crossed random-effects model is widely used in applied statistics, finding applications in various fields such as longitudinal studies, e-commerce, and recommender systems, among others. However, these models encounter scalability challenges, as the computational time grows disproportionately with the number of data points, typically following a cubic root relationship $(N^{(3/2)}$ or worse) with $N$. Our inspiration for addressing this issue comes from observing the recommender system employed by an online clothing retailer. Our dataset comprises over 700,000 clients, 5,000 items, and 5,000,000 measurements. When applying the maximum likelihood approach to fit crossed random effects, computational inefficiency becomes a significant concern, limiting the applicability of this approach in large-scale settings. To tackle the scalability issues, previous research by Ghosh et al. (2022a) and Ghosh et al. (2022b) has explored linear and logistic regression models utilizing fixed-effect features based on client and item variables, while incorporating random intercept terms for clients and items. In this study, we present a more generalized version of the problem, allowing random effect sizes/slopes. This extension enables us to capture the variability in effect size among both clients and items. Importantly, we have developed a scalable solution to address the aforementioned problem and have empirically demonstrated the consistency of our estimates. Specifically, as the number of data points increases, our estimates converge towards the true parameters. To validate our approach, we implement the proposed algorithm using Stitch Fix data.

翻译：交叉随机效应模型在应用统计学中广泛应用，常见于纵向研究、电子商务和推荐系统等领域。然而，这类模型面临可扩展性挑战，其计算时间随数据点数量呈非比例增长，通常与N存在立方根关系$(N^{(3/2)}$或更差)。我们解决该问题的灵感来源于对某在线服装零售商推荐系统的观察。我们的数据集包含超过70万客户、5000种商品和500万次测量。在采用最大似然方法拟合交叉随机效应时，计算效率低下成为显著问题，限制了该方法在大规模场景中的适用性。为解决可扩展性问题，Ghosh等人（2022a, 2022b）先前的研究利用基于客户和商品变量的固定效应特征，并引入客户及商品的随机截距项，探索了线性和逻辑回归模型。本研究提出该问题的更一般化版本，允许随机效应大小/斜率存在。这一扩展使我们能够捕捉客户和商品间效应大小的变异性。重要的是，我们开发了针对上述问题的可扩展解决方案，并通过实证验证了估计的一致性——随着数据点数量增加，估计值收敛于真实参数。为验证方法有效性，我们在Stitch Fix数据上实现了所提出的算法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

《多范式建模与仿真：系统工程视角》CMU 2022最新24页slides

专知会员服务

59+阅读 · 2022年11月4日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日