Decentralized Online Convex Optimization with Unknown Feedback Delays

Decentralized online convex optimization (D-OCO), where multiple agents within a network collaboratively learn optimal decisions in real-time, arises naturally in applications such as federated learning, sensor networks, and multi-agent control. In this paper, we study D-OCO under unknown, time-and agent-varying feedback delays. While recent work has addressed this problem (Nguyen et al., 2024), existing algorithms assume prior knowledge of the total delay over agents and still suffer from suboptimal dependence on both the delay and network parameters. To overcome these limitations, we propose a novel algorithm that achieves an improved regret bound of O N $\sqrt$ d tot + N $\sqrt$ T (1-$σ$2) 1/4 , where T is the total horizon, d tot denotes the average total delay across agents, N is the number of agents, and 1 -$σ$ 2 is the spectral gap of the network. Our approach builds upon recent advances in D-OCO (Wan et al., 2024a), but crucially incorporates an adaptive learning rate mechanism via a decentralized communication protocol. This enables each agent to estimate delays locally using a gossip-based strategy without the prior knowledge of the total delay. We further extend our framework to the strongly convex setting and derive a sharper regret bound of O N $δ$max ln T $α$ , where $α$ is the strong convexity parameter and $δ$ max is the maximum number of missing observations averaged over agents. We also show that our upper bounds for both settings are tight up to logarithmic factors. Experimental results validate the effectiveness of our approach, showing improvements over existing benchmark algorithms.

翻译：去中心化在线凸优化（D-OCO）是指网络中的多个智能体协同实时学习最优决策，该问题自然出现在联邦学习、传感器网络和多智能体控制等应用中。本文研究了在未知且随时间和智能体变化的反馈延迟下的D-OCO问题。尽管近期已有工作（Nguyen等人，2024）探讨了该问题，但现有算法均假设已知智能体间的总延迟先验知识，并且其性能在延迟和网络参数上的依赖关系仍非最优。为克服这些限制，我们提出了一种新颖算法，实现了改进的遗憾界O( N√d_tot + N√T/(1-σ²)^{1/4} )，其中T为总时间范围，d_tot表示跨智能体的平均总延迟，N为智能体数量，1-σ²为网络谱间隙。我们的方法基于D-OCO的最新进展（Wan等人，2024a），但关键之处在于通过去中心化通信协议引入了自适应学习率机制。这使得每个智能体能够基于一种gossip策略在无需总延迟先验知识的情况下本地估计延迟。我们进一步将框架扩展至强凸场景，并推导出更尖锐的遗憾界O( Nδ_max lnT / α )，其中α为强凸参数，δ_max为各智能体平均缺失观测的最大数量。我们还证明了两种场景下的上界在忽略对数因子意义下是紧的。实验结果验证了我们方法的有效性，显示出相较于现有基准算法的性能提升。