We study the problem of collaboratively learning least squares estimates for $m$ agents. Each agent observes a different subset of the features$\unicode{x2013}$e.g., containing data collected from sensors of varying resolution. Our goal is to determine how to coordinate the agents in order to produce the best estimator for each agent. We propose a distributed, semi-supervised algorithm Collab, consisting of three steps: local training, aggregation, and distribution. Our procedure does not require communicating the labeled data, making it communication efficient and useful in settings where the labeled data is inaccessible. Despite this handicap, our procedure is nearly asymptotically local minimax optimal$\unicode{x2013}$even among estimators allowed to communicate the labeled data such as imputation methods. We test our method on real and synthetic data.
翻译:我们研究了 $m$ 个智能体协同学习最小二乘估计的问题。每个智能体观测到不同的特征子集——例如包含来自不同分辨率传感器收集的数据。我们的目标是确定如何协调这些智能体,以便为每个智能体生成最优估计量。我们提出了一种分布式半监督算法 Collab,包含三个步骤:局部训练、聚合与分发。该算法无需通信标记数据,因此通信效率高,适用于标记数据不可访问的场景。尽管存在这一限制,我们的算法仍能实现近乎渐近局部极小极大最优性——甚至优于允许通信标记数据的估计方法(如插补法)。我们在真实数据集与合成数据集上测试了所提方法。