We develop a distribution regression model under endogenous sample selection. This model is a semi-parametric generalization of the Heckman selection model. It accommodates much richer effects of the covariates on outcome distribution and patterns of heterogeneity in the selection process, and allows for drastic departures from the Gaussian error structure, while maintaining the same level tractability as the classical model. The model applies to continuous, discrete and mixed outcomes. We provide identification, estimation, and inference methods, and apply them to obtain wage decomposition for the UK. Here we decompose the difference between the male and female wage distributions into composition, wage structure, selection structure, and selection sorting effects. After controlling for endogenous employment selection, we still find substantial gender wage gap -- ranging from 21\% to 40\% throughout the (latent) offered wage distribution that is not explained by composition. We also uncover positive sorting for single men and negative sorting for married women that accounts for a substantive fraction of the gender wage gap at the top of the distribution.
翻译:我们构建了一个存在内生样本选择问题的分布回归模型。该模型是Heckman选择模型的半参数推广形式,在保持与传统模型同等可处理性的前提下,能够容纳协变量对结果分布更为丰富的影响效应、选择过程中异质性模式,并允许误差结构显著偏离高斯分布。该模型适用于连续型、离散型及混合型结果变量。我们提出了相应的识别、估计与推断方法,并将其应用于英国工资分解研究。在该应用中,我们将男性和女性工资分布差异分解为构成效应、工资结构效应、选择结构效应与选择排序效应。在控制内生就业选择后,我们仍发现显著的性别工资差距——在整个(潜变量)拟发工资分布中,构成效应无法解释的差距达21%至40%。同时,我们发现单身男性存在正向排序效应,已婚女性存在负向排序效应,这在分布顶端解释了性别工资差距的实质性部分。