Multifold Confidence Intervals in Collaborative Mean Estimation (ColME) Using Sample Statistics

The rapid growth of digital devices and IoT has intensified the demand for collaborative learning. Since these devices generate sensitive and high-dimensional data, centralized transmission is often impractical, while local learning suffers from slow convergence. Collaborative approaches can alleviate these issues by allowing agents to use information from one another to improve estimation. Each agent faces a personalized learning problem, and collaboration is beneficial among agents whose data are generated from the same distributions. This paper studies the problem of personalized online mean estimation in heterogeneous environments, where each agent observes data from its own sigma-sub-Gaussian distribution. Collaborative algorithms enable agents to identify similarity classes in real time and exploit information from agents belonging to the the same class to improve convergence and accuracy. The work builds on existing approaches: the collaborative mean estimation (colME) and its graph-based extensions (C-colME and B-colME), which improve scalability and robustness. Since the variance estimation plays a crucial role in the above mentioned algorithms, a method for accurate, local and real-time estimation of variance is proposed. Estimation of sample kurtosis is also incorporated. We derive the CI estimators for the sample standard deviation and sample kurtosis. These results are combined with sample colME methods to design a unified procedure for constructing multifold CI based jointly on the sample mean, sample variance, and sample kurtosis. This framework enables colME in challenging scenarios, such as when classes share similar means but differ in variances, or when both means and variances are alike while the underlying distributions diverge in higher-order characteristics.

翻译：数字设备和物联网的快速发展加剧了对协作学习的需求。由于这些设备生成敏感且高维的数据，集中式传输通常不切实际，而本地学习则面临收敛缓慢的问题。协作方法通过允许智能体利用彼此信息改进估计，能够缓解这些问题。每个智能体面临个性化的学习问题，且数据生成自相同分布的智能体间的协作具有益处。本文研究异构环境中的个性化在线均值估计问题，其中每个智能体观测来自其自身σ次高斯分布的数据。协作算法使智能体能够实时识别相似类别，并利用属于同一类别的智能体信息以提高收敛速度和精度。本工作基于现有方法：协作均值估计（colME）及其基于图的扩展（C-colME和B-colME），这些方法提升了可扩展性和鲁棒性。由于方差估计在上述算法中起着关键作用，本文提出了一种精确、本地化且实时的方差估计方法，同时引入了样本峰度的估计。我们推导了样本标准差和样本峰度的置信区间估计量。这些结果与样本colME方法相结合，设计了一个基于样本均值、样本方差和样本峰度联合构建多重置信区间的统一流程。该框架使得colME能够在具有挑战性的场景中实现，例如当类别均值相似但方差不同，或当均值和方差均相似而底层分布在高阶特征上存在差异时。