During recent years, mediation analysis has become increasingly popular in many research fields. Basically, the aim of mediation analysis is to investigate the direct effect of exposure on outcome together with indirect effects along the pathways from exposure to outcome. There has been a great number of articles that applied mediation analysis to data from hundreds or thousands of individuals. With the rapid development of technology, the volume of avaliable data increases exponentially, which brings new challenges to researchers. It is often computationally infeasible to directly conduct statistical analysis for large datasets. However, there are very few results on mediation analysis with massive data. In this paper, we propose to use the subsampled double bootstrap as well as divide-and-conquer algorithm to perform statistical mediation analysis for large-scale dataset. Extensive numerical simulations are conducted to evaluate the performance of our method. Two real data examples are also provided to illustrate the usefulness of our approach in practical application.
翻译:近年来,中介分析在众多研究领域日益普及。从本质上讲,中介分析旨在探究暴露因素对结局变量的直接效应,以及通过从暴露到结局的路径产生的间接效应。已有大量文献将中介分析方法应用于包含数百至数千个体数据的研究中。随着技术的飞速发展,可用数据量呈指数级增长,这给研究人员带来了新的挑战。对大规模数据集直接进行统计分析往往在计算上不可行。然而,关于大规模数据中介分析的研究成果却十分有限。本文提出采用子采样双自助法(subsampled double bootstrap)结合分而治之算法(divide-and-conquer algorithm)对大规模数据集进行统计中介分析。通过大量数值模拟实验评估了该方法的性能,并提供了两个真实数据案例以说明该方法在实际应用中的有效性。