We introduce a new data fusion method that utilizes multiple data sources to estimate a smooth, finite-dimensional parameter. Most existing methods only make use of fully aligned data sources that share common conditional distributions of one or more variables of interest. However, in many settings, the scarcity of fully aligned sources can make existing methods require unduly large sample sizes to be useful. Our approach enables the incorporation of weakly aligned data sources that are not perfectly aligned, provided their degree of misalignment can be characterized by a prespecified density ratio model. We describe gains in efficiency and provide a general means to construct estimators achieving these gains. We illustrate our results by fusing data from two harmonized HIV monoclonal antibody prevention efficacy trials to study how a neutralizing antibody biomarker associates with HIV genotype.
翻译:我们提出了一种新的数据融合方法,该方法利用多个数据源来估计一个光滑的有限维参数。现有的大多数方法仅使用完全对齐的数据源,这些数据源共享一个或多个感兴趣变量的共同条件分布。然而,在许多场景中,完全对齐源的稀缺性可能导致现有方法需要过大的样本量才能发挥作用。我们的方法能够纳入未完全对齐的弱对齐数据源,前提是其错位程度可以通过预先指定的密度比模型来刻画。我们描述了效率提升,并提供了实现这些提升的通用估计量构建方法。我们通过融合两项协调一致的HIV单克隆抗体预防效力试验数据来验证结果,研究中和抗体生物标志物与HIV基因型之间的关联。