The recent proliferation of computers and the internet have opened new opportunities for collecting and processing data. However, such data are often obtained without a well-planned probability survey design. Such non-probability based samples cannot be automatically regarded as representative of the population of interest. Several classes of methods for estimation and inferences from non-probability samples have been developed in recent years. The quasi-randomization methods assume that non-probability sample selection is governed by an underlying latent random mechanism. The basic idea is to use information collected from a probability ("reference") sample to uncover latent non-probability survey participation probabilities (also known as "propensity scores") and use them in estimation of target finite population parameters. In this paper, we review and compare theoretical properties of recently developed methods of estimation survey participation probabilities and study their relative performances in simulations.
翻译:近年来计算机和互联网的普及为数据收集与处理带来了新机遇。然而,此类数据往往缺乏基于周密概率抽样设计的过程。这类非概率样本无法自动视为目标总体的代表性样本。近年来,针对非概率样本的估计与推断方法已发展出多种类型。准随机化方法假定非概率样本的选择机制受潜在随机过程控制,其核心思想是利用概率样本("参考"样本)收集的信息,揭示非概率调查参与概率(又称"倾向得分"),并将其用于目标有限总体参数的估计。本文系统梳理并比较了近期发展的调查参与概率估计方法之理论特性,并通过模拟研究考察了它们的相对表现。