Knowledge transfer across data sources holds great promise for improving the estimation of target population parameters by leveraging the growing availability of data from different sources. However, the effectiveness of knowledge transfer is often challenged by the complex and pervasive heterogeneity between data sources and the lack of access to individual-level data. This paper proposes the divide-and-shrink (dShrink) method, a transfer estimation method that estimates target population parameters in a closed form using summary statistics from a target population and some external source populations while accounting for population heterogeneity. The dShrink estimator is guaranteed to outperform the estimator based solely on the target population in terms of expected quadratic error under arbitrary population heterogeneity. The gain can be substantial when the target and source populations are similar, or the underlying true parameter values are near zero. Notably, dShrink is model-free, requires no user-specified tuning parameters, robust to various types of heterogeneity between data sources, and applies to a broad range of parameter estimation problems. dShrink remains effective even when the covariance matrix is not accessible for the external summary statistics and offers flexibility in incorporating side information and summary statistics from multiple source populations. Simulations and real data analyses demonstrate the superior performance of the dShrink estimator and its potential as a robust tool for transfer estimation.
翻译:跨数据源的知识迁移通过利用日益增长的不同来源数据,有望改进目标总体参数的估计。然而,数据源间复杂且普遍存在的异质性,以及个体层面数据的不可及性,常常制约着知识迁移的有效性。本文提出分割收缩(dShrink)方法,这是一种迁移估计方法,它利用来自目标总体和若干外部源总体的汇总统计量,以封闭形式估计目标总体参数,同时兼顾总体异质性。在任意总体异质性下,dShrink估计量在期望二次误差方面均能保证优于仅基于目标总体的估计量。当目标总体与源总体相似,或潜在真实参数值接近零时,其增益可能十分显著。值得注意的是,dShrink无需模型假设,无需用户指定调节参数,对数据源间的各类异质性具有稳健性,并适用于广泛的参数估计问题。即使外部汇总统计量的协方差矩阵不可得,dShrink仍保持有效,并能灵活整合辅助信息及来自多个源总体的汇总统计量。模拟实验和真实数据分析展示了dShrink估计量的优越性能及其作为迁移估计鲁棒工具的潜力。