The recent proliferation of computers and the internet have opened new opportunities for collecting and processing data. However, such data are often obtained without a well-planned probability survey design. Such non-probability based samples cannot be automatically regarded as representative of the population of interest. Several classes of methods for estimation and inferences from non-probability samples have been developed in recent years. The quasi-randomization methods assume that non-probability sample selection is governed by an underlying latent random mechanism. The basic idea is to use information collected from a probability ("reference") sample to uncover latent non-probability survey participation probabilities (also known as "propensity scores") and use them in estimation of target finite population parameters. In this paper, we review and compare theoretical properties of recently developed methods of estimation survey participation probabilities and study their relative performances in simulations.
翻译:近年来计算机与互联网的普及为数据收集与处理开辟了新的机遇。然而,此类数据往往并非通过精心设计的概率抽样调查获得。这类基于非概率的样本不能自动视为对目标总体具有代表性。近年来已发展出若干类基于非概率样本进行估计与推断的方法。准随机化方法假定非概率样本选择受潜在的隐性随机机制支配。其基本思想是利用从概率("参照")样本收集的信息来揭示潜在的非概率调查参与概率(亦称"倾向得分"),并将其用于目标有限总体参数的估计。本文综述并比较了近期发展的调查参与概率估计方法的理论性质,并通过模拟研究评估了它们的相对性能。