In numerous settings, agents lack sufficient data to directly learn a model. Collaborating with other agents may help, but it introduces a bias-variance trade-off, when local data distributions differ. A key challenge is for each agent to identify clients with similar distributions while learning the model, a problem that remains largely unresolved. This study focuses on a simplified version of the overarching problem, where each agent collects samples from a real-valued distribution over time to estimate its mean. Existing algorithms face impractical space and time complexities (quadratic in the number of agents A). To address scalability challenges, we propose a framework where agents self-organize into a graph, allowing each agent to communicate with only a selected number of peers r. We introduce two collaborative mean estimation algorithms: one draws inspiration from belief propagation, while the other employs a consensus-based approach, with complexity of O( r |A| log |A|) and O(r |A|), respectively. We establish conditions under which both algorithms yield asymptotically optimal estimates and offer a theoretical characterization of their performance.
翻译:在许多场景中,智能体缺乏足够的数据直接学习模型。与其他智能体协作可能有所帮助,但当局部数据分布不同时,会引入偏差-方差权衡。一个关键挑战是让每个智能体在学习模型的同时识别具有相似分布的客户端,这一问题在很大程度上仍未解决。本研究聚焦于该总体问题的简化版本,其中每个智能体随时间从实值分布中收集样本以估计其均值。现有算法面临不可行的空间和时间复杂度(与智能体数量A呈二次方关系)。为应对可扩展性挑战,我们提出一个框架,其中智能体自组织成图结构,允许每个智能体仅与选定数量的对等体r通信。我们引入了两种协作均值估计算法:一种受置信传播启发,另一种采用基于共识的方法,其复杂度分别为O(r |A| log |A|)和O(r |A|)。我们建立了两种算法均能产生渐近最优估计的条件,并对其性能进行了理论刻画。