In the field of online sequential decision-making, we address the problem with delays utilizing the framework of online convex optimization (OCO), where the feedback of a decision can arrive with an unknown delay. Unlike previous research that is limited to Euclidean norm and gradient information, we propose three families of delayed algorithms based on approximate solutions to handle different types of received feedback. Our proposed algorithms are versatile and applicable to universal norms. Specifically, we introduce a family of Follow the Delayed Regularized Leader algorithms for feedback with full information on the loss function, a family of Delayed Mirror Descent algorithms for feedback with gradient information on the loss function and a family of Simplified Delayed Mirror Descent algorithms for feedback with the value information of the loss function's gradients at corresponding decision points. For each type of algorithm, we provide corresponding regret bounds under cases of general convexity and relative strong convexity, respectively. We also demonstrate the efficiency of each algorithm under different norms through concrete examples. Furthermore, our theoretical results are consistent with the current best bounds when degenerated to standard settings.
翻译:在线序列决策领域,我们利用在线凸优化(OCO)框架处理存在延迟的问题,其中决策的反馈可能以未知延迟到达。与以往局限于欧几里得范数和梯度信息的研究不同,我们基于近似解提出三类延迟算法,以处理不同类型的接收反馈。所提算法具有通用性,适用于任意范数。具体而言,针对损失函数信息完整的反馈,我们提出延迟正则化领导者追踪算法族;针对损失函数梯度信息的反馈,提出延迟镜像下降算法族;针对对应决策点损失函数梯度值信息的反馈,提出简化延迟镜像下降算法族。针对每类算法,我们分别在一般凸性和相对强凸性条件下给出相应的遗憾界。通过具体实例,我们还展示了不同范数下各算法的效率。此外,当退化至标准设置时,我们的理论结果与当前最优界一致。