We propose a framework for adaptive data-centric collaborative learning among self-interested agents, coordinated by an arbiter. Designed to handle the incremental nature of real-world data, the framework operates in an online manner: at each step, the arbiter collects a batch of data from agents, trains a machine learning model, and provides each agent with a distinct model reflecting its data contributions. This setup establishes a feedback loop where shared data influence model updates, and the resulting models guide future data-sharing strategies. Agents evaluate and partition their data, selecting a partition to share using a stochastic parameterized policy optimized via policy gradient methods to optimize the utility of the received model as defined by agent-specific evaluation functions. On the arbiter side, the expected loss function over the true data distribution is optimized, incorporating agent-specific weights to account for distributional differences arising from diverse sources and selective sharing. A bilevel optimization algorithm jointly learns the model parameters and agent-specific weights. Mean-zero noise, computed using a distortion function that adjusts these agent-specific weights, is introduced to generate distinct agent-specific models, promoting valuable data sharing without requiring separate training. Our framework is underpinned by non-asymptotic analyses, ensuring convergence of the agent-side policy optimization to an approximate stationary point of the evaluation functions and convergence of the arbiter-side optimization to an approximate stationary point of the expected loss function.
翻译:我们提出了一种由仲裁者协调的自利智能体间自适应数据中心协作学习框架。该框架专为处理现实世界数据的增量特性而设计,以在线方式运行:在每一步中,仲裁者从智能体收集一批数据,训练机器学习模型,并为每个智能体提供反映其数据贡献的独特模型。这种设置建立了一个反馈循环,其中共享数据影响模型更新,而生成的模型指导未来的数据共享策略。智能体通过策略梯度方法优化的随机参数化策略来评估和划分其数据,选择要共享的数据分区,以最大化由智能体特定评估函数定义的接收模型的效用。在仲裁者端,优化真实数据分布上的期望损失函数,其中包含智能体特定权重以处理来自不同来源和选择性共享导致的分布差异。采用双层优化算法联合学习模型参数和智能体特定权重。通过使用调整这些智能体特定权重的失真函数计算出的均值为零的噪声,生成独特的智能体特定模型,从而促进有价值的数据共享而无需单独训练。我们的框架基于非渐近分析,确保智能体端策略优化收敛到评估函数的近似稳定点,且仲裁者端优化收敛到期望损失函数的近似稳定点。