Capacity-Constrained Online Convex Optimization with Delayed Feedback

Online learning with delayed feedback typically assumes that the learner can track all pending rounds until their feedback arrives. In practice, tracking resources are finite, and feedback from untracked rounds is permanently lost. In this paper, we study delayed online convex optimization (OCO) under a hard capacity constraint, where at most $C$ pending rounds can be tracked at any time. To model delay information, we introduce a semi-clairvoyant model that refines the clairvoyant assumption from prior work: rather than requiring delays to be known at prediction time, the learner observes delay expirations online, consistent with the classical unconstrained delayed setting. Our approach proceeds via a reduction to a novel ``delayed and weighted'' OCO problem, using a scheduler that randomizes tracking decisions and importance-weights the resulting observations. For this base problem, we propose and analyze Delayed-Weighted FTRL and its bandit analogue, establishing regret bounds that explicitly characterize the interaction between time-varying weights and delayed feedback. Combining these base learners with our schedulers yields the first regret guarantees for capacity-constrained OCO under convex and strongly convex losses, for both first-order and bandit feedback. For first-order feedback, capacity $C = Ω(\log T)$ suffices to recover standard delayed OCO rates up to logarithmic factors. For bandit feedback, the regret rates are modulated by powers of $(1 + σ_{\text{max}}/C)$, where $σ_{\text{max}}$ is the maximum number of pending observations at any time. This allows the regret bound to degrade gracefully when $C < σ_{\text{max}}$, while remaining sublinear.

翻译：具有延迟反馈的在线学习通常假设学习者可以跟踪所有待处理轮次直至其反馈到达。然而实践中，跟踪资源是有限的，未被跟踪轮次的反馈将永久丢失。本文研究在严格容量约束下的延迟在线凸优化问题，其中任意时刻最多可跟踪C个待处理轮次。为建模延迟信息，我们提出一种半预见性模型，该模型细化了先前工作中的全预见假设：学习者无需在预测时已知延迟，而是像经典无约束延迟设置一样，在延迟到期时在线观察延迟信息。我们的方法通过将问题归约为新颖的"延迟加权"在线凸优化问题来实现，其中采用随机化跟踪决策的调度器，并对产生的观测结果进行重要性加权。针对该基础问题，我们提出并分析了延迟加权FTRL算法及其对应的赌博机算法，建立了明确刻画时变权重与延迟反馈交互关系的遗憾上界。将上述基础学习器与我们的调度器相结合，首次获得了凸损失与强凸损失下容量受限在线凸优化问题的一阶反馈和赌博机反馈的遗憾保证。对于一阶反馈，容量C = Ω(log T)即可在忽略对数因子恢复标准延迟在线凸优化的收敛速率。对于赌博机反馈，遗憾率由(1 + σ_max/C)的幂次决定，其中σ_max为任意时刻最大待处理观测数。这使得当C < σ_max时，遗憾界能够优雅退化，同时保持次线性特性。