To address the multidimensional nature of health-related questions, advances in health research often require integrating information from various data sources within statistical analyses. When complementary information pertaining to the same set of individuals are distributed across different institutions, vertical methods make it possible to obtain analysis results without sharing or pooling individual-level data. To guide stakeholders toward a transparent use of vertical methods, this study aims to (1) Identify existing vertical methods enabling statistical inference; and (2) Characterize the methodological properties of these methods and the current extent of their use with health data. We conducted a scoping review using four interdisciplinary databases. We then systematically extracted the characteristics of identified vertical methods with respect to comparability with the pooled analysis, efficiency of communication schemes and confidentiality. We additionally screened studies that cited included articles to identify applications on vertically partitioned real-world health data. Among 2887 articles initially screened, 30 were included in the review. Inference for the linear and the logistic regression framework were the most frequent statistical inference tasks undertaken in proposed methods. Equivalence with the pooled analyses was not systematically addressed and most methods required multiple communications between participating parties. Almost all articles described their approach as privacy-preserving, although a minority provided privacy assessments. The scope of existing approaches enabling statistical inference for vertically partitioned data is still relatively limited. Most existing methods do not concurrently achieve results equivalent to centralized analyses, high communication efficiency, and guaranteed protection of individual-level data.
翻译:为应对健康问题的多维度特性,健康研究的进展常需在统计分析中整合来自不同数据源的信息。当同一组个体的互补信息分布于不同机构时,垂直方法可在不共享或汇集个体层面数据的前提下获取分析结果。为引导利益相关方透明使用垂直方法,本研究旨在:(1) 识别现有实现统计推断的垂直方法;(2) 刻画这些方法的属性特征及其在健康数据中的当前应用程度。我们利用四个跨学科数据库开展范围综述,系统提取所识别垂直方法在池化分析可比性、通信方案效率及保密性方面的特征。此外,我们筛查了引用纳入文献的研究,以识别其在真实垂直分割健康数据中的应用案例。在初筛的2887篇文献中,30篇纳入综述。线性回归与逻辑回归框架下的推断是所提出方法中最常见的统计推断任务。与池化分析的等价性未得到系统验证,多数方法要求参与方间进行多次通信。几乎所有文献均宣称其方法具有隐私保护特性,但仅有少数提供了隐私评估。当前实现垂直分割数据统计推断的方法范围仍相对有限。现有方法大多无法同时实现与集中分析等价的结果、高通信效率及个体层面数据的安全性保障。