Outcome-dependent sampling designs are extensively utilized in various scientific disciplines, including epidemiology, ecology, and economics, with retrospective case-control studies being specific examples of such designs. Additionally, if the outcome used for sample selection is also mismeasured, then it is even more challenging to estimate the average treatment effect (ATE) accurately. To our knowledge, no existing method can address these two issues simultaneously. In this paper, we establish the identifiability of ATE and propose a novel method for estimating ATE in the context of generalized linear model. The estimator is shown to be consistent under some regularity conditions. To relax the model assumption, we also consider generalized additive model. We propose to estimate ATE using penalized B-splines and establish asymptotic properties for the proposed estimator. Our methods are evaluated through extensive simulation studies and the application to a dataset from the UK Biobank, with alcohol intake as the treatment and gout as the outcome.
翻译:结果依赖抽样设计广泛应用于流行病学、生态学和经济学等科学领域,回顾性病例对照研究是其具体实例之一。此外,如果用于样本选择的结果也存在测量误差,那么准确估计平均处理效应会更加困难。据我们所知,现有方法无法同时解决这两个问题。在本文中,我们建立了平均处理效应的可识别性,并提出了一种在广义线性模型背景下估计平均处理效应的新方法。研究表明,该估计量在某些正则条件下具有一致性。为放宽模型假设,我们还考虑了广义可加模型。我们提出使用惩罚B样条估计平均处理效应,并建立所提估计量的渐近性质。通过大量模拟研究以及英国生物银行数据集的应用(以酒精摄入量为处理变量,以痛风为结果变量),我们的方法得到了评估。