Non-probability survey samples are examples of data sources that have become increasingly popular in recent years, also in official statistics. However, statistical inference based on non-probability samples is much more difficult because they are biased and are not representative of the target population (Wu, 2022). In this paper we consider a method of joint calibration for totals (Deville & S\"arndal, 1992) and quantiles (Harms & Duchesne, 2006) and use the proposed approach to extend existing inference methods for non-probability samples, such as inverse probability weighting, mass imputation and doubly robust estimators. By including quantile information in the estimation process non-linear relationships between the target and auxiliary variables can be approximated the way it is done in step-wise (constant) regression. Our simulation study has demonstrated that the estimators in question are more robust against model mis-specification and, as a result, help to reduce bias and improve estimation efficiency. Variance estimation for our proposed approach is also discussed. We show that existing inference methods can be used and that the resulting confidence intervals are at nominal levels. Finally, we applied the proposed methods to estimate the share of vacancies aimed at Ukrainian workers in Poland using an integrated set of administrative and survey data about job vacancies. The proposed approaches have been implemented in two R packages (nonprobsvy and jointCalib), which were used to conduct the simulation and empirical study
翻译:摘要: 非概率调查样本作为近年来在官方统计中也日益流行的数据源之一,其统计推断因样本存在偏差且无法代表目标总体(Wu, 2022)而面临更大挑战。本文考虑将总量校准(Deville & Särndal, 1992)与分位数校准(Harms & Duchesne, 2006)相结合的联合校准方法,并以此拓展现有非概率样本推断方法,包括逆概率加权、质量插补和双重稳健估计量。通过在估计过程中纳入分位数信息,可近似模拟目标变量与辅助变量间的非线性关系,其原理与分段(常数)回归类似。模拟研究表明,所提估计量对模型误设具有更强的稳健性,从而有助于降低偏差并提升估计效率。本文还讨论了所提方法的方差估计,证实现有推断方法仍可应用,且所得置信区间可达到名义水平。最终,我们利用整合的行政与调查职位空缺数据,将上述方法应用于估算波兰面向乌克兰工人的职位空缺占比。所提方法已在两个R包(nonprobsvy与jointCalib)中实现,并用于开展模拟与实证研究。