Probit models are useful for modeling correlated discrete responses in many disciplines, including consumer choice data in economics and marketing. However, the Gaussian latent variable feature of probit models coupled with identification constraints pose significant computational challenges for its estimation and inference, especially when the dimension of the discrete response variable is large. In this paper, we propose a computationally efficient Expectation-Maximization (EM) algorithm for estimating large probit models. Our work is distinct from existing methods in two important aspects. First, instead of simulation or sampling methods, we apply and customize expectation propagation (EP), a deterministic method originally proposed for approximate Bayesian inference, to estimate moments of the truncated multivariate normal (TMVN) in the E (expectation) step. Second, we take advantage of a symmetric identification condition to transform the constrained optimization problem in the M (maximization) step into a one-dimensional problem, which is solved efficiently using Newton's method instead of off-the-shelf solvers. Our method enables the analysis of correlated choice data in the presence of more than 100 alternatives, which is a reasonable size in modern applications, such as online shopping and booking platforms, but has been difficult in practice with probit models. We apply our probit estimation method to study ordering effects in hotel search results on Expedia's online booking platform.
翻译:Probit模型在众多学科中对于建模相关离散响应具有重要价值,尤其在经济学与市场营销领域的消费者选择数据分析中应用广泛。然而,Probit模型的高斯潜变量特性与识别约束条件相结合,为其参数估计与统计推断带来了显著的计算挑战,当离散响应变量的维度较大时尤为突出。本文提出一种用于估计大规模Probit模型的计算高效期望最大化(EM)算法。我们的研究在以下两个重要方面区别于现有方法:首先,在E(期望)步中,我们采用并定制了最初为近似贝叶斯推断提出的确定性方法——期望传播(EP),而非依赖模拟或抽样方法,以估计截断多元正态(TMVN)分布的矩量;其次,我们利用对称识别条件将M(最大化)步中的约束优化问题转化为一维优化问题,并采用牛顿法而非通用求解器进行高效求解。本方法能够分析包含超过100个选项的相关选择数据——这一规模在现代应用(如在线购物与预订平台)中具有现实意义,但传统Probit模型在实际应用中对此类问题的处理一直存在困难。我们将所提出的Probit估计方法应用于Expedia在线预订平台,以研究酒店搜索结果中的排序效应。