Empirical likelihood is a very important nonparametric approach which is of wide application. However, it is hard and even infeasible to calculate the empirical log-likelihood ratio statistic with massive data. The main challenge is the calculation of the Lagrange multiplier. This motivates us to develop a distributed empirical likelihood method by calculating the Lagrange multiplier in a multi-round distributed manner. It is shown that the distributed empirical log-likelihood ratio statistic is asymptotically standard chi-squared under some mild conditions. The proposed algorithm is communication-efficient and achieves the desired accuracy in a few rounds. Further, the distributed empirical likelihood method is extended to the case of Byzantine failures. A machine selection algorithm is developed to identify the worker machines without Byzantine failures such that the distributed empirical likelihood method can be applied. The proposed methods are evaluated by numerical simulations and illustrated with an analysis of airline on-time performance study and a surface climate analysis of Yangtze River Economic Belt.
翻译:经验似然是一种非常重要的非参数方法,具有广泛的应用。然而,在处理海量数据时,计算经验对数似然比统计量变得困难甚至不可行,其主要挑战在于拉格朗日乘子的计算。这促使我们开发了一种分布式经验似然方法,通过多轮分布式方式计算拉格朗日乘子。研究表明,在温和条件下,分布式经验对数似然比统计量渐近服从标准卡方分布。所提出的算法具有通信高效性,能在数轮内达到所需精度。进一步地,我们将分布式经验似然方法推广至存在拜占庭故障的情形。为此开发了一种机器选择算法,用于识别无拜占庭故障的工作节点,从而使得分布式经验似然方法得以应用。通过数值模拟评估了所提方法,并借助航空准点率研究以及长江经济带地表气候分析进行了实例验证。