Integrating multiple observational studies to make unconfounded causal or descriptive comparisons of group potential outcomes in a large natural population is challenging. Moreover, retrospective cohorts, being convenience samples, are usually unrepresentative of the natural population of interest and have groups with unbalanced covariates. We propose a general covariate-balancing framework based on pseudo-populations that extends established weighting methods to the meta-analysis of multiple retrospective cohorts with multiple groups. Additionally, by maximizing the effective sample sizes of the cohorts, we propose a FLEXible, Optimized, and Realistic (FLEXOR) weighting method appropriate for integrative analyses. We develop new weighted estimators for unconfounded inferences on wide-ranging population-level features and estimands relevant to group comparisons of quantitative, categorical, or multivariate outcomes. The asymptotic properties of these estimators are examined, and accurate small-sample procedures are devised for quantifying estimation uncertainty. Through simulation studies and meta-analyses of TCGA datasets, we discover the differential biomarker patterns of the two major breast cancer subtypes in the United States and demonstrate the versatility and reliability of the proposed weighting strategy, especially for the FLEXOR pseudo-population.
翻译:整合多项观察性研究以在大规模自然人群中进行无混杂因果或描述性群体潜在结局比较具有挑战性。此外,回顾性队列作为便利样本,通常对目标自然人群缺乏代表性,且各群体间协变量分布不均衡。本文提出一种基于伪人群的广义协变量平衡框架,将既有加权方法扩展至多群体多回顾性队列的荟萃分析。通过最大化队列有效样本量,我们提出适用于整合分析的灵活、优化且现实(FLEXOR)加权方法。针对群体间定量、分类或多变量结局比较中广泛的人群层面特征与估计量,我们开发了用于无混杂推断的新型加权估计量。检验了这些估计量的渐近性质,并设计了精确小样本程序以量化估计不确定性。通过模拟研究与TCGA数据集荟萃分析,我们发现了美国两种主要乳腺癌亚型的差异性生物标志物模式,并验证了所提加权策略(尤其是FLEXOR伪人群)的通用性与可靠性。