The recent proliferation of computers and the internet have opened new opportunities for collecting and processing data. However, such data are often obtained without a well-planned probability survey design. Such non-probability based samples cannot be automatically regarded as representative of the population of interest. Several classes of methods for estimation and inferences from non-probability samples have been developed in recent years. The quasi-randomization methods assume that non-probability sample selection is governed by an underlying latent random mechanism. The basic idea is to use information collected from a probability ("reference") sample to uncover latent non-probability survey participation probabilities (also known as "propensity scores") and use them in estimation of target finite population parameters. In this paper, we review and compare theoretical properties of recently developed methods of estimation survey participation probabilities and study their relative performances in simulations.
翻译:近年来计算机和互联网的普及为数据采集与处理开辟了新途径。然而此类数据通常缺乏周密的概率抽样设计,基于非概率的样本不能自动视为目标总体的代表性样本。近年来已发展出多类非概率样本估计与推断方法。准随机化方法假设非概率样本的选择受潜在随机机制支配,其核心理念是利用概率样本("参考样本")收集的信息来揭示潜在的非概率调查参与概率(亦称"倾向得分"),并将其用于有限总体参数估计。本文系统回顾并比较了近期发展的调查参与概率估计方法的理论性质,并通过模拟研究评估了这些方法的相对表现性能。