Heterogeneity in multinomial choice data is often accounted for using logit models with random coefficients. Such models are called "mixed", but they can be difficult to estimate for large datasets. We review current Bayesian variational inference (VI) methods that can do so, and propose a new VI method that scales more effectively. The key innovation is a step that updates efficiently a Gaussian approximation to the conditional posterior of the random coefficients, addressing a bottleneck within the variational optimization. The approach is used to estimate three types of mixed logit models: standard, nested and bundle variants. We first demonstrate the improvement of our new approach over existing VI methods using simulations. Our method is then applied to a large scanner panel dataset of pasta choice. We find consumer response to price and promotion variables exhibits substantial heterogeneity at the grocery store and product levels. Store size, premium and geography are found to be drivers of store level estimates of price elasticities. Extension to bundle choice with pasta sauce improves model accuracy further. Predictions from the mixed models are more accurate than those from fixed coefficients equivalents, and our VI method provides insights in circumstances which other methods find challenging.
翻译:多项选择数据中的异质性通常通过使用具有随机系数的logit模型来刻画。这类模型被称为"混合"模型,但在处理大规模数据集时其参数估计可能面临困难。本文回顾了当前能够实现此类估计的贝叶斯变分推断方法,并提出了一种具有更高可扩展性的新型变分推断方法。该方法的核心创新在于引入了一个高效更新随机系数条件后验高斯近似的步骤,从而解决了变分优化过程中的计算瓶颈问题。我们将该方法应用于三种混合logit模型的估计:标准型、嵌套型及捆绑选择型。首先通过模拟实验证明了新方法相对于现有变分推断方法的改进效果。随后将方法应用于一个大型意大利面选择扫描面板数据集。研究发现消费者对价格和促销变量的反应在零售店层面和产品层面均表现出显著的异质性。商店规模、高端定位和地理位置被证实是商店层面价格弹性估计的重要驱动因素。进一步扩展至意大利面酱的捆绑选择分析后,模型精度得到进一步提升。混合模型的预测精度显著优于固定系数模型,且我们的变分推断方法在其他方法面临挑战的场景中仍能提供有效洞见。