Probit models are useful for modeling correlated discrete responses in many disciplines, including discrete choice data in economics. However, the Gaussian latent variable feature of probit models coupled with identification constraints pose significant computational challenges for its estimation and inference, especially when the dimension of the discrete response variable is large. In this paper, we propose a computationally efficient Expectation-Maximization (EM) algorithm for estimating large probit models. Our work is distinct from existing methods in two important aspects. First, instead of simulation or sampling methods, we apply and customize expectation propagation (EP), a deterministic method originally proposed for approximate Bayesian inference, to estimate moments of the truncated multivariate normal (TMVN) in the E (expectation) step. Second, we take advantage of a symmetric identification condition to transform the constrained optimization problem in the M (maximization) step into a one-dimensional problem, which is solved efficiently using Newton's method instead of off-the-shelf solvers. Our method enables the analysis of correlated choice data in the presence of more than 100 alternatives, which is a reasonable size in modern applications, such as online shopping and booking platforms, but has been difficult in practice with probit models. We apply our probit estimation method to study ordering effects in hotel search results on Expedia.com.
翻译:Probit模型对于建模多个学科中的相关离散响应具有重要价值,尤其在经济学中用于分析离散选择数据。然而,Probit模型的高斯潜变量特性结合识别约束条件,给其估计与推断带来了显著的计算挑战,当离散响应变量的维度较大时尤为突出。本文提出一种计算高效的期望最大化(EM)算法,用于估计大规模Probit模型。我们的研究在以下两个重要方面区别于现有方法:首先,在E(期望)步中,我们采用并定制了最初为近似贝叶斯推断提出的确定性方法——期望传播(EP),以估计截断多元正态(TMVN)分布的矩量,而非依赖模拟或抽样方法;其次,我们利用对称识别条件,将M(最大化)步中的约束优化问题转化为一维优化问题,并通过牛顿法高效求解,而非使用现成的求解器。该方法使得在存在超过100个选择方案的情况下分析相关选择数据成为可能——这在现代应用(如在线购物与预订平台)中属于合理规模,但以往采用Probit模型在实际中难以处理。我们将所提出的Probit估计方法应用于研究Expedia.com酒店搜索结果中的排序效应。