As a nonparametric statistical inference approach, empirical likelihood has been found very useful in numerous occasions. However, it encounters serious computational challenges when applied directly to the modern massive dataset. This article studies empirical likelihood inference over decentralized distributed networks, where the data are locally collected and stored by different nodes. To fully utilize the data, this article fuses Lagrange multipliers calculated in different nodes by employing a penalization technique. The proposed distributed empirical log-likelihood ratio statistic with Lagrange multipliers solved by the penalized function is asymptotically standard chi-squared under regular conditions even for a divergent machine number. Nevertheless, the optimization problem with the fused penalty is still hard to solve in the decentralized distributed network. To address the problem, two alternating direction method of multipliers (ADMM) based algorithms are proposed, which both have simple node-based implementation schemes. Theoretically, this article establishes convergence properties for proposed algorithms, and further proves the linear convergence of the second algorithm in some specific network structures. The proposed methods are evaluated by numerical simulations and illustrated with analyses of census income and Ford gobike datasets.
翻译:作为一种非参数统计推断方法,经验似然已在众多场景中展现出重要应用价值。然而,当直接应用于现代大规模数据集时,该方法面临严峻的计算挑战。本文研究去中心化分布式网络中数据由不同节点本地采集与存储场景下的经验似然推断问题。为充分利用数据资源,本文通过引入惩罚技术融合不同节点计算得到的拉格朗日乘子。所提出的分布式经验对数似然比统计量在正则条件下,即使机器数量发散,其拉格朗日乘子通过惩罚函数求解后仍渐近服从标准卡方分布。然而,在去中心化分布式网络中,带有融合惩罚的优化问题仍难以求解。针对该问题,本文提出两种基于交替方向乘子法(ADMM)的算法,两者均具有简单的节点化实现方案。理论层面,本文建立了所提算法的收敛性质,并进一步证明了第二种算法在特定网络结构下的线性收敛性。通过数值仿真实验及人口普查收入数据集与福特共享单车数据集分析,验证了所提方法的有效性。