Revisiting McFadden's correction factor for sampling of alternatives in multinomial logit and mixed multinomial logit models

In this paper, we revisit McFadden (1978)'s correction factor for sampling of alternatives in multinomial logit (MNL) and mixed multinomial logit (MMNL) models. McFadden (1978) proved that consistent parameter estimates are obtained when estimating MNL models using a sampled subset of alternatives, including the chosen alternative, in combination with a correction factor. We decompose this correction factor into i) a correction for overestimating the MNL choice probability due to using a smaller subset of alternatives, and ii) a correction for which a subset of alternatives is contrasted through utility differences and thereby the extent to which we learn about the parameters of interest in MNL. Keane and Wasi (2016) proved that the overall expected positive information divergence - comprising the above two elements - is minimised between the true and sampled likelihood when applying a sampling protocol satisfying uniform conditioning. We generalise their result to the case of positive conditioning and show that whilst McFadden (1978)'s correction factor may not minimise the overall expected information divergence, it does minimise the expected information loss with respect to the parameters of interest. We apply this result in the context of Bayesian analysis and show that McFadden (1978)'s correction factor minimises the expected information loss regarding the parameters of interest across the entire posterior density irrespective of sample size. In other words, McFadden (1978)'s correction factor has desirable small and large sample properties. We also show that our results for Bayesian MNL models transfer to MMNL and that only McFadden (1978) correction factor is sufficient to minimise the expected information loss in the parameters of interest. Monte Carlo simulations illustrate the successful application of sampling of alternatives in Bayesian MMNL models.

翻译：本文重新审视了McFadden（1978）针对多项式Logit（MNL）和混合多项式Logit（MMNL）模型中备选项抽样提出的修正因子。McFadden（1978）证明，使用包含被选方案在内的备选项抽样子集，并结合修正因子估计MNL模型时，可获得一致的参数估计。我们将该修正因子分解为：i）因使用较小备选项子集导致MNL选择概率被高估的修正，以及ii）通过效用差异对比备选项子集进而影响MNL目标参数学习程度的修正。Keane与Wasi（2016）证明，当应用满足均匀条件化的抽样协议时，真实似然与抽样似然之间的总体预期正信息散度（包含上述两个要素）达到最小化。我们将该结论推广至正条件化情形，并证明虽然McFadden（1978）修正因子未必能最小化总体预期信息散度，但其能最小化目标参数的预期信息损失。我们将此结论应用于贝叶斯分析框架，表明McFadden（1978）修正因子在全部后验密度上——无论样本量大小——均能最小化目标参数的预期信息损失。换言之，该修正因子兼具优良的小样本与大样本性质。我们进一步证明，贝叶斯MNL模型的研究结论可推广至MMNL模型，且仅需采用McFadden（1978）修正因子即可最小化目标参数的预期信息损失。蒙特卡洛模拟验证了贝叶斯MMNL模型中备选项抽样的成功应用。