Data aggregation, also known as meta analysis, is widely used to combine knowledge on parameters shared in common (e.g., average treatment effect) between multiple studies. In this paper, we introduce an attractive data aggregation scheme that pools summary statistics from various existing studies. Our scheme informs the design of new validation studies and yields us unbiased estimators for the shared parameters. In our setup, each existing study applies a LASSO regression to select a parsimonious model from a large set of covariates. It is well known that post-hoc estimators, in the selected model, tend to be biased. We show that a novel technique called \textit{data carving} yields us a new unbiased estimator by aggregating simple summary statistics from all existing studies. Our estimator has two key features: (a) we make the fullest possible use of data, from all studies, without the risk of bias from model selection; (b) we enjoy the added benefit of individual data privacy, because raw data from these studies need not be shared or stored for efficient estimation.
翻译:数据聚合(亦称元分析)被广泛用于整合多项研究中共同参数(如平均处理效应)的知识。本文提出一种具有吸引力的数据聚合方案,该方案汇集了来自不同现有研究的汇总统计量。我们的方案可指导新验证研究的设计,并为共享参数提供无偏估计量。在设定中,每项现有研究均采用LASSO回归从大量协变量中选择简约模型。众所周知,选定模型中的事后估计量往往存在偏差。我们证明,一种称为"数据雕刻"的新技术通过聚合所有现有研究的简单汇总统计量,可构建新的无偏估计量。该估计量具备两个关键特征:(a) 可最大限度利用所有研究数据,且不会因模型选择产生偏差风险;(b) 具有隐私保护优势,因为无需共享或存储各研究的原始数据即可实现高效估计。