Integrating multiple observational studies to make unconfounded causal or descriptive comparisons of group potential outcomes in a large natural population is challenging. Moreover, retrospective cohorts, being convenience samples, are usually unrepresentative of the natural population of interest and have groups with unbalanced covariates. We propose a general covariate-balancing framework based on pseudo-populations that extends established weighting methods to the meta-analysis of multiple retrospective cohorts with multiple groups. Additionally, by maximizing the effective sample sizes of the cohorts, we propose a FLEXible, Optimized, and Realistic (FLEXOR) weighting method appropriate for integrative analyses. We develop new weighted estimators for unconfounded inferences on wide-ranging population-level features and estimands relevant to group comparisons of quantitative, categorical, or multivariate outcomes. The asymptotic properties of these estimators are examined, and accurate small-sample procedures are devised for quantifying estimation uncertainty. Through simulation studies and meta-analyses of TCGA datasets, we discover the differential biomarker patterns of the two major breast cancer subtypes in the United States and demonstrate the versatility and reliability of the proposed weighting strategy, especially for the FLEXOR pseudo-population.
翻译:整合多项观察性研究以实现对大型自然人群中各组潜在结果的去混杂因果或描述性比较具有挑战性。此外,回顾性队列作为便利样本,通常不能代表目标自然人群,且各组间协变量分布不均衡。我们提出了一种基于伪人群的通用协变量平衡框架,将成熟的加权方法扩展至多个多组回顾性队列的元分析中。通过最大化队列的有效样本量,我们提出了一种灵活、优化且现实的(FLEXOR)加权方法,适用于整合分析。我们开发了新的加权估计量,用于对定量、分类或多变量结果进行组间比较所涉及的各种人群层面特征和估计量进行去混杂推断。我们检验了这些估计量的渐近性质,并设计了精确的小样本程序以量化估计不确定性。通过模拟研究及对TCGA数据集的元分析,我们发现了美国两种主要乳腺癌亚型的差异生物标志物模式,并验证了所提出的加权策略(尤其是FLEXOR伪人群)的普适性与可靠性。