Latent variable models are increasingly used in economics for high-dimensional categorical data like text and surveys. We demonstrate the effectiveness of Hamiltonian Monte Carlo (HMC) with parallelized automatic differentiation for analyzing such data in a computationally efficient and methodologically sound manner. Our new model, Supervised Topic Model with Covariates, shows that carefully modeling this type of data can have significant implications on conclusions compared to a simpler, frequently used, yet methodologically problematic, two-step approach. A simulation study and revisiting Bandiera et al. (2020)'s study of executive time use demonstrate these results. The approach accommodates thousands of parameters and doesn't require custom algorithms specific to each model, making it accessible for applied researchers
翻译:潜变量模型在高维分类数据(如文本和调查数据)的经济学分析中应用日益广泛。我们证明了结合并行化自动微分的哈密顿蒙特卡洛方法(HMC)在计算高效且方法学严谨的框架下分析此类数据的有效性。我们提出的新模型——带协变量的监督主题模型表明,与简单但常用且方法学存在缺陷的两步法相比,对此类数据进行的精细化建模可能对研究结论产生重大影响。通过模拟研究及对Bandiera等人(2020)关于高管时间使用研究的重新审视,验证了上述结果。该方法能处理数千个参数,且无需依赖针对特定模型的自定义算法,使应用研究者更易掌握使用。