In numerous settings, agents lack sufficient data to directly learn a model. Collaborating with other agents may help, but it introduces a bias-variance trade-off, when local data distributions differ. A key challenge is for each agent to identify clients with similar distributions while learning the model, a problem that remains largely unresolved. This study focuses on a simplified version of the overarching problem, where each agent collects samples from a real-valued distribution over time to estimate its mean. Existing algorithms face impractical space and time complexities (quadratic in the number of agents A). To address scalability challenges, we propose a framework where agents self-organize into a graph, allowing each agent to communicate with only a selected number of peers r. We introduce two collaborative mean estimation algorithms: one draws inspiration from belief propagation, while the other employs a consensus-based approach, with complexity of O( r |A| log |A|) and O(r |A|), respectively. We establish conditions under which both algorithms yield asymptotically optimal estimates and offer a theoretical characterization of their performance.
翻译:在许多场景中,智能体缺乏足够的数据直接学习模型。与其他智能体协作可能有所帮助,但当局部数据分布存在差异时,会引入偏差-方差权衡问题。一个关键挑战是让每个智能体在学习模型的同时识别具有相似数据分布的客户端,这一问题至今尚未得到充分解决。本研究聚焦于该总体问题的简化版本:每个智能体随时间从实数分布中采集样本以估计其均值。现有算法面临不可行的空间和时间复杂度(与智能体数量A呈二次关系)。为解决可扩展性挑战,我们提出一个框架,使智能体自组织成图结构,允许每个智能体仅与选定数量的r个对等体通信。我们引入了两种协作均值估计算法:一种受信念传播启发,另一种采用基于共识的方法,其复杂度分别为O(r |A| log |A|)和O(r |A|)。我们建立了这两种算法在渐近意义下生成最优估计的条件,并对其性能进行了理论刻画。