We propose a framework for adaptive data-centric collaborative learning among self-interested agents, coordinated by an arbiter. Designed to handle the incremental nature of real-world data, the framework operates in an online manner: at each step, the arbiter collects a batch of data from agents, trains a machine learning model, and provides each agent with a distinct model reflecting its data contributions. This setup establishes a feedback loop where shared data influence model updates, and the resulting models guide future data-sharing strategies. Agents evaluate and partition their data, selecting a partition to share using a stochastic parameterized policy optimized via policy gradient methods to optimize the utility of the received model as defined by agent-specific evaluation functions. On the arbiter side, the expected loss function over the true data distribution is optimized, incorporating agent-specific weights to account for distributional differences arising from diverse sources and selective sharing. A bilevel optimization algorithm jointly learns the model parameters and agent-specific weights. Mean-zero noise, computed using a distortion function that adjusts these agent-specific weights, is introduced to generate distinct agent-specific models, promoting valuable data sharing without requiring separate training. Our framework is underpinned by non-asymptotic analyses, ensuring convergence of the agent-side policy optimization to an approximate stationary point of the evaluation functions and convergence of the arbiter-side optimization to an approximate stationary point of the expected loss function.
翻译:本文提出了一种由仲裁者协调的自利智能体间自适应数据中心协作学习框架。该框架针对现实世界数据的增量特性设计,以在线方式运行:在每一步中,仲裁者从智能体处收集一批数据,训练机器学习模型,并向每个智能体提供反映其数据贡献的差异化模型。该机制建立了反馈循环:共享数据影响模型更新,而生成的模型则指导未来的数据共享策略。智能体通过策略梯度方法优化的随机参数化策略,评估并划分其数据,选择要共享的数据分区,以最大化其特定评估函数所定义的接收模型效用。在仲裁者端,通过引入智能体特定权重以处理不同数据源及选择性共享导致的分布差异,优化真实数据分布上的期望损失函数。采用双层优化算法联合学习模型参数与智能体特定权重。通过基于调整智能体特定权重的失真函数计算出的零均值噪声,生成差异化的智能体特定模型,从而在不需单独训练的情况下促进有价值的数据共享。本框架通过非渐近分析论证了智能体端策略优化收敛至评估函数的近似稳定点,且仲裁者端优化收敛至期望损失函数的近似稳定点。