During the past few years, mediation analysis has gained increasing popularity across various research fields. The primary objective of mediation analysis is to examine the direct impact of exposure on outcome, as well as the indirect effects that occur along the pathways from exposure to outcome. There has been a great number of articles that applied mediation analysis to data from hundreds or thousands of individuals. With the rapid development of technology, the volume of avaliable data increases exponentially, which brings new challenges to researchers. Directly conducting statistical analysis for large datasets is often computationally infeasible. Nonetheless, there is a paucity of findings regarding mediation analysis in the context of big data. In this paper, we propose utilizing subsampled double bootstrap and divide-and-conquer algorithms to conduct statistical mediation analysis on large-scale datasets. The proposed algorithms offer a significant enhancement in computational efficiency over traditional bootstrap confidence interval and Sobel test, while simultaneously ensuring desirable confidence interval coverage and power. We conducted extensive numerical simulations to evaluate the performance of our method. The practical applicability of our approach is demonstrated through two real-world data examples.
翻译:近年来,中介分析在各研究领域日益受到关注。其核心目标在于检验暴露因素对结局变量的直接效应,以及通过暴露-结局路径产生的间接效应。已有大量研究将中介分析应用于数百至数千个体样本数据。随着技术的飞速发展,可用数据量呈指数级增长,这给研究者带来了新挑战。直接对大规模数据集进行统计分析往往在计算上不可行。然而,关于大数据背景下中介分析的研究成果仍然匮乏。本文提出采用子抽样双重自助法与分治算法对大规模数据集进行统计中介分析。相较于传统自助法置信区间与Sobel检验,所提算法在显著提升计算效率的同时,确保了理想的置信区间覆盖率和统计效能。我们通过大量数值模拟评估了该方法的表现,并通过两个真实数据案例验证了其实用价值。