Nonprobability follow-up sample analysis: an application to SARS-CoV-2 infection prevalence estimation

Public health policy makers are faced with making crucial decisions rapidly during infectious disease outbreaks such as that caused by SARS-CoV-2. Ideally, rapidly deployed representative health surveys could provide needed data for such decisions. Under the constraints of a limited timeframe and resources, it may be infeasible to implement random based (probability) sampling that yields a population representative survey sample with high response rates. As an alternative, a volunteer (nonprobability) sample is often collected using outreach methods such as social media and web surveys. Compared to a probability sample, a nonprobability sample is subject to selection bias. In addition, when participants are followed longitudinally nonresponse often occurs at later follow up timepoints. As a result, estimates of cross-sectional parameters at later timepoints will be subject to selection bias and nonresponse bias. In this paper, we create kernel-weighted pseudoweights (KW) for the baseline survey participants and construct nonresponse-adjusted kw (kwNR) for respondents at each follow-visit to estimate the population mean at the follow-up visits. We develop Taylor Linearization variance estimation that accounts for variability due to estimating both pseudoweights and the nonresponse adjustments. Simulations are conducted to evaluate the proposed kwNR-weighted estimates. We investigate covariate effects on each of the following: baseline sample participation propensity, follow-up response propensity and the mean of the outcome. We apply the proposed kwNR-weighted methods to the SARS-Cov-2 antibody seropositivity longitudinal study, which begins with a baseline survey early in the pandemic, and collects data at six- and twelve-month post baseline follow-ups.

翻译：公共卫生政策制定者在传染病暴发（如SARS-CoV-2引发的疫情）期间面临迅速做出关键决策的挑战。理想情况下，快速部署的代表性健康调查可为这类决策提供必要数据。但在有限时间和资源约束下，实施基于随机（概率）抽样以获得高应答率的人群代表性调查样本可能不可行。作为替代方案，通常通过社交媒体和网络调查等外展方法收集志愿者（非概率）样本。与概率样本相比，非概率样本存在选择偏倚。此外，当参与者接受纵向追踪时，在后续时间点常出现无应答。因此，后续时间点的横截面参数估计将同时受选择偏倚和无应答偏倚影响。本文为基线调查参与者创建核加权伪权重（KW），并为每次随访中的应答者构建无应答调整后的核加权伪权重（kwNR），以估计随访期的人群均值。我们开发了泰勒线性化方差估计方法，该方法同时考虑了伪权重估计和无应答调整的变异性。通过模拟实验评估所提出的kwNR加权估计方法。我们研究了协变量对以下方面的影响：基线样本参与倾向、随访应答倾向和结局均值。将提出的kwNR加权方法应用于SARS-CoV-2抗体血清阳性纵向研究，该研究始于疫情早期的基线调查，并在基线后六个月和十二个月随访时收集数据。