We develop a distribution regression model under endogenous sample selection. This model is a semi-parametric generalization of the Heckman selection model. It accommodates much richer effects of the covariates on outcome distribution and patterns of heterogeneity in the selection process, and allows for drastic departures from the Gaussian error structure, while maintaining the same level tractability as the classical model. The model applies to continuous, discrete and mixed outcomes. We provide identification, estimation, and inference methods, and apply them to obtain wage decomposition for the UK. Here we decompose the difference between the male and female wage distributions into composition, wage structure, selection structure, and selection sorting effects. After controlling for endogenous employment selection, we still find substantial gender wage gap -- ranging from 21% to 40% throughout the (latent) offered wage distribution that is not explained by composition. We also uncover positive sorting for single men and negative sorting for married women that accounts for a substantive fraction of the gender wage gap at the top of the distribution.
翻译:我们提出了一个存在内生样本选择问题的分布回归模型。该模型是Heckman选择模型的半参数推广,不仅允许协变量对结果分布产生更丰富的影响,还能捕捉选择过程中异质性的多种模式,并允许误差项严重偏离高斯分布,同时保持与传统模型同等的可处理性。该模型适用于连续、离散及混合型结果变量。我们给出了模型的识别、估计与推断方法,并将其应用于英国工资分解。在此分解中,我们将男性和女性工资分布的差异分解为组成效应、工资结构效应、选择结构效应和选择排序效应。在控制内生就业选择后,我们仍发现显著的性别工资差距——在(潜在)提供的工资分布中,有21%至40%的差距无法由组成效应解释。此外,我们发现单身男性存在正向选择,已婚女性存在负向选择,这在很大程度上解释了工资分布顶端的性别工资差距。