Statistical inference in the presence of nuisance functionals with complex survey data is an important topic in social and economic studies. The Gini index, Lorenz curves and quantile shares are among the commonly encountered examples. The nuisance functionals are usually handled by a plug-in nonparametric estimator and the main inferential procedure can be carried out through a two-step generalized empirical likelihood method. Unfortunately, the resulting inference is not efficient and the nonparametric version of the Wilks' theorem breaks down even under simple random sampling. We propose an augmented estimating equations method with nuisance functionals and complex surveys. The second-step augmented estimating functions obey the Neyman orthogonality condition and automatically handle the impact of the first-step plug-in estimator, and the resulting estimator of the main parameters of interest is invariant to the first step method. More importantly, the generalized empirical likelihood based Wilks' theorem holds for the main parameters of interest under the design-based framework for commonly used survey designs, and the maximum generalized empirical likelihood estimators achieve the semiparametric efficiency bound. Performances of the proposed methods are demonstrated through simulation studies and an application using the dataset from the New York City Social Indicators Survey.
翻译:在复杂调查数据中处理干扰函数的统计推断是社会与经济研究中的一个重要课题。基尼指数、洛伦兹曲线和分位数份额是常见例子中具有代表性的指标。干扰函数通常通过将非参数估计量代入的方式处理,而主要的推断过程可通过两步广义经验似然方法实现。然而,即使是在简单随机抽样下,这种推断方法效率不高,且威尔克斯定理的非参数版本也会失效。我们提出了一种适用于具有干扰函数和复杂调查数据的增广估计方程方法。第二步的增广估计函数满足内曼正交性条件,能自动处理第一步代入估计量的影响;同时,主要关注参数的估计量不受第一步方法的影响。更重要的是,在基于设计框架下,对于常用的调查设计,基于广义经验似然的威尔克斯定理对主要关注参数成立,且最大广义经验似然估计量达到了半参数效率上界。通过模拟研究和纽约市社会指标调查数据集的实证应用,验证了所提方法的性能。