The Heckman selection model is widely used in econometric analysis and other social sciences to address sample selection bias in data modeling. A common assumption in Heckman selection models is that the error terms follow an independent bivariate normal distribution. However, real-world data often deviates from this assumption, exhibiting heavy-tailed behavior, which can lead to inconsistent estimates if not properly addressed. In this paper, we propose a Bayesian analysis of Heckman selection models that replace the Gaussian assumption with well-known members of the class of scale mixture of normal distributions, such as the Student's-t and contaminated normal distributions. For these complex structures, Stan's default No-U-Turn sampler is utilized to obtain posterior simulations. Through extensive simulation studies, we compare the performance of the Heckman selection models with normal, Student's-t and contaminated normal distributions. We also demonstrate the broad applicability of this methodology by applying it to medical care and labor supply data. The proposed algorithms are implemented in the R package HeckmanStan.
翻译:Heckman选择模型在计量经济学分析及其他社会科学领域中被广泛用于处理数据建模中的样本选择偏差问题。Heckman选择模型中的一个常见假设是误差项服从独立二元正态分布。然而,现实世界的数据常常偏离这一假设,呈现出厚尾特性,若处理不当可能导致估计结果不一致。本文提出了一种Heckman选择模型的贝叶斯分析方法,该方法用正态分布尺度混合族中的经典分布(如Student's-t分布与污染正态分布)替代了原有的高斯假设。针对这些复杂结构,我们利用Stan默认的No-U-Turn采样器进行后验模拟。通过大量模拟研究,我们比较了采用正态分布、Student's-t分布及污染正态分布的Heckman选择模型的性能。我们还通过将该方法应用于医疗护理与劳动力供给数据,展示了其广泛的适用性。所提出的算法已在R软件包HeckmanStan中实现。