Nonprobability (convenience) samples are increasingly sought to reduce the estimation variance for one or more population variables of interest that are estimated using a randomized survey (reference) sample by increasing the effective sample size. Estimation of a population quantity derived from a convenience sample will typically result in bias since the distribution of variables of interest in the convenience sample is different from the population distribution. A recent set of approaches estimates inclusion probabilities for convenience sample units by specifying reference sample-weighted pseudo likelihoods. This paper introduces a novel approach that derives the propensity score for the observed sample as a function of inclusion probabilities for the reference and convenience samples as our main result. Our approach allows specification of a likelihood directly for the observed sample as opposed to the approximate or pseudo likelihood. We construct a Bayesian hierarchical formulation that simultaneously estimates sample propensity scores and the convenience sample inclusion probabilities. We use a Monte Carlo simulation study to compare our likelihood based results with the pseudo likelihood based approaches considered in the literature.
翻译:非概率(便利)样本的使用日益增多,旨在通过增加有效样本量,降低基于随机调查(参考)样本估计一个或多个感兴趣总体变量时的估计方差。由于便利样本中变量分布与总体分布存在差异,基于便利样本推导的总体量估计通常会产生偏差。近期一系列方法通过指定参考样本加权伪似然估计便利样本的包含概率。本文提出一种新方法,以参考样本和便利样本的包含概率函数形式推导观测样本的倾向得分作为主要结果。本方法允许直接为观测样本指定似然函数,而非近似似然或伪似然。我们构建了贝叶斯分层模型,同步估计样本倾向得分与便利样本包含概率。通过蒙特卡洛模拟研究,将基于似然的结果与文献中提出的伪似然方法进行对比。