Distributed learning offers a practical solution for the integrative analysis of multi-source datasets, especially under privacy or communication constraints. However, addressing prospective distributional heterogeneity and ensuring communication efficiency pose significant challenges on distributed statistical analysis. In this article, we focus on integrative estimation of distributed heterogeneous precision matrices, a crucial task related to joint precision matrix estimation where computation-efficient algorithms and statistical optimality theories are still underdeveloped. To tackle these challenges, we introduce a novel HEterogeneity-adjusted Aggregating and Thresholding (HEAT) approach for distributed integrative estimation. HEAT is designed to be both communication- and computation-efficient, and we demonstrate its statistical optimality by establishing the convergence rates and the corresponding minimax lower bounds under various integrative losses. To enhance the optimality of HEAT, we further propose an iterative HEAT (IteHEAT) approach. By iteratively refining the higher-order errors of HEAT estimators through multi-round communications, IteHEAT achieves geometric contraction rates of convergence. Extensive simulations and real data applications validate the numerical performance of HEAT and IteHEAT methods.
翻译:分布式学习为多源数据集的集成分析提供了一种实用的解决方案,尤其在面临隐私或通信约束时。然而,处理潜在的分布异质性并确保通信效率对分布式统计分析提出了重大挑战。本文聚焦于分布式异构精度矩阵的集成估计,这是联合精度矩阵估计中的一项关键任务,目前其计算高效算法与统计最优性理论仍不完善。为应对这些挑战,我们提出了一种新颖的异质性调整聚合与阈值化方法,用于分布式集成估计。该方法旨在同时实现通信与计算的高效性,我们通过建立多种集成损失下的收敛率及相应的极小极大下界,证明了其统计最优性。为进一步提升该方法的最优性,我们进一步提出了迭代异质性调整聚合与阈值化方法。通过多轮通信迭代地修正原估计量的高阶误差,该方法实现了收敛率的几何收缩。大量的模拟实验与真实数据应用验证了这两种方法的数值性能。