Heterogeneity in multinomial choice data is often accounted for using logit models with random coefficients. Such models are called "mixed", but they can be difficult to estimate for large datasets. We review current Bayesian variational inference (VI) methods that can do so, and propose a new VI method that scales more effectively. The key innovation is a step that updates efficiently a Gaussian approximation to the conditional posterior of the random coefficients, addressing a bottleneck within the variational optimization. The approach is used to estimate three types of mixed logit models: standard, nested and bundle variants. We first demonstrate the improvement of our new approach over existing VI methods using simulations. Our method is then applied to a large scanner panel dataset of pasta choice. We find consumer response to price and promotion variables exhibits substantial heterogeneity at the grocery store and product levels. Store size, premium and geography are found to be drivers of store level estimates of price elasticities. Extension to bundle choice with pasta sauce improves model accuracy further. Predictions from the mixed models are more accurate than those from fixed coefficients equivalents, and our VI method provides insights in circumstances which other methods find challenging.
翻译:多项选择数据中的异质性通常通过使用具有随机系数的Logit模型来刻画。这类模型被称为"混合"模型,但对于大规模数据集而言,其参数估计往往较为困难。本文系统回顾了能够处理此类问题的现有贝叶斯变分推断方法,并提出了一种具有更高可扩展性的新型变分推断方法。该方法的核心创新在于引入了一个高效更新随机系数条件后验高斯近似的步骤,从而解决了变分优化过程中的计算瓶颈问题。我们将该方法应用于三类混合Logit模型的估计:标准型、嵌套型及捆绑选择型。首先通过仿真实验证明了新方法相对于现有变分推断方法的改进效果。随后将方法应用于大规模意大利面选择扫描面板数据集。研究发现消费者对价格和促销变量的响应在零售店层面和产品层面均表现出显著的异质性。商店规模、高端定位和地理区位被证实是影响商店层面价格弹性估计的关键驱动因素。进一步扩展至包含意面酱的捆绑选择模型后,模型精度得到进一步提升。混合模型的预测精度显著优于固定系数模型,且我们的变分推断方法在其他方法面临挑战的场景中仍能提供有效的分析视角。