Statistical analysis of large dataset is a challenge because of the limitation of computing devices memory and excessive computation time. Divide and Conquer (DC) algorithm is an effective solution path, but the DC algorithm has some limitations. Empirical likelihood is an important semiparametric and nonparametric statistical method for parameter estimation and statistical inference, and the estimating equation builds a bridge between empirical likelihood and traditional statistical methods, which makes empirical likelihood widely used in various traditional statistical models. In this paper, we propose a novel approach to address the challenges posed by empirical likelihood with massive data, which called split sample mean empirical likelihood(SSMEL). We show that the SSMEL estimator has the same estimation efficiency as the empirical likelihood estimatior with the full dataset, and maintains the important statistical property of Wilks' theorem, allowing our proposed approach to be used for statistical inference. The effectiveness of the proposed approach is illustrated using simulation studies and real data analysis.
翻译:大数据集的统计分析面临计算设备内存限制和计算时间过长的挑战。分治算法是一种有效的解决路径,但该算法存在一定局限性。经验似然是参数估计与统计推断中重要的半参数和非参数统计方法,估计方程架起了经验似然与传统统计方法之间的桥梁,使经验似然广泛运用于各类传统统计模型。本文提出了一种应对海量数据经验似然问题的新方法——分裂样本均值经验似然。研究表明,SSMEL估计量具有与全数据集经验似然估计量相同的估计效率,并保留了威尔克斯定理的重要统计性质,使所提方法可用于统计推断。通过模拟研究和实际数据分析验证了该方法的有效性。