A Capture-Recapture Approach to Facilitate Causal Inference for a Trial-eligible Observational Cohort

Background: We extend recently proposed design-based capture-recapture methods for prevalence estimation among registry participants, in order to support causal inference among a trial-eligible target population. The proposed design for CRC analysis integrates an observational study cohort with a randomized trial involving a small representative study sample, and enhances the generalizability and transportability of the findings. Methods: We develop a novel CRC-type estimator derived via multinomial distribution-based maximum-likelihood that exploits the design to deliver benefits in terms of validity and efficiency for comparing the effects of two treatments on a binary outcome. Additionally, the design enables a direct standardization-type estimator for efficient estimation of general means (e.g., of biomarker levels) under a specific treatment, and for their comparison across treatments. For inference, we propose a tailored Bayesian credible interval approach to improve coverage properties in conjunction with the proposed CRC estimator for binary outcomes, along with a bootstrap percentile interval approach for use in the case of continuous outcomes. Results: Simulations demonstrate the proposed estimators derived from the CRC design. The multinomial-based maximum-likelihood estimator shows benefits in terms of validity and efficiency in treatment effect comparisons, while the direct standardization-type estimator allows comprehensive comparison of treatment effects within the target population. Conclusion: The extended CRC methods provide a useful framework for causal inference in a trial-eligible target population by integrating observational and randomized trial data. The novel estimators enhance the generalizability and transportability of findings, offering efficient and valid tools for treatment effect comparisons on both binary and continuous outcomes.

翻译：背景：我们将近期提出的基于设计的捕获-再捕获方法从登记参与者患病率估计扩展到支持试验合格目标人群的因果推断。所提出的捕获-再捕获分析设计将观察性研究队列与包含小型代表性研究样本的随机试验相结合，增强了研究结果的普遍性与可迁移性。方法：我们通过基于多项分布的最大似然估计推导出一种新型捕获-再捕获型估计量，该估计量利用设计优势，在比较两种治疗方案对二元结局影响的效度与效率方面具有优势。此外，该设计支持采用直接标准化型估计量，用于高效估计特定治疗方案下的总体均值（如生物标志物水平）以及跨治疗方案的比较。在推断方面，我们提出了一种定制的贝叶斯可信区间方法，结合所提出的二元结局捕获-再捕获估计量以改进覆盖特性，同时针对连续型结局提出了自助百分位区间方法。结果：模拟实验验证了源自捕获-再捕获设计的估计量性能。基于多项分布的最大似然估计量在治疗效果比较中展现出效度与效率优势，而直接标准化型估计量支持在目标人群内进行治疗效果的综合比较。结论：扩展的捕获-再捕获方法通过整合观察性与随机试验数据，为试验合格目标人群的因果推断提供了有效框架。新型估计量增强了研究结果的普遍性与可迁移性，为二元及连续型结局的治疗效果比较提供了高效且有效的分析工具。