Suppose we have individual data from an internal study and various summary statistics from relevant external studies. External summary statistics have the potential to improve statistical inference for the internal population; however, it may lead to efficiency loss or bias if not used properly. We study the fusion of individual data and summary statistics in a semiparametric framework to investigate the efficient use of external summary statistics. Under a weak transportability assumption, we establish the semiparametric efficiency bound for estimating a general functional of the internal data distribution, which is no larger than that using only internal data and underpins the potential efficiency gain of integrating individual data and summary statistics. We propose a data-fused efficient estimator that achieves this efficiency bound. In addition, an adaptive fusion estimator is proposed to eliminate the bias of the original data-fused estimator when the transportability assumption fails. We establish the asymptotic oracle property of the adaptive fusion estimator. Simulations and application to a Helicobacter pylori infection dataset demonstrate the promising numerical performance of the proposed method.
翻译:假设我们拥有来自一项内部研究的个体数据,以及来自相关外部研究的各种汇总统计量。外部汇总统计量具有改善内部人群统计推断的潜力;然而,若使用不当,可能导致效率损失或偏差。我们在半参数框架下研究个体数据与汇总统计量的融合,以探讨外部汇总统计量的有效利用方式。在一个较弱的可迁移性假设下,我们建立了估计内部数据分布一般泛函的半参数效率下界,该下界不大于仅使用内部数据时的效率下界,从而为融合个体数据与汇总统计量可能带来的效率提升提供了理论依据。我们提出了一种数据融合有效估计量,该估计量能够达到此效率下界。此外,当可迁移性假设不成立时,我们提出了一种自适应融合估计量以消除原始数据融合估计量的偏差。我们建立了自适应融合估计量的渐近oracle性质。模拟实验及对幽门螺杆菌感染数据集的应用表明,所提方法具有良好的数值性能。