This paper introduces a dual-based algorithm framework for solving the regularized online resource allocation problems, which have potentially non-concave cumulative rewards, hard resource constraints, and a non-separable regularizer. Under a strategy of adaptively updating the resource constraints, the proposed framework only requests approximate solutions to the empirical dual problems up to a certain accuracy and yet delivers an optimal logarithmic regret under a locally second-order growth condition. Surprisingly, a delicate analysis of the dual objective function enables us to eliminate the notorious log-log factor in regret bound. The flexible framework renders renowned and computationally fast algorithms immediately applicable, e.g., dual stochastic gradient descent. Additionally, an infrequent re-solving scheme is proposed, which significantly reduces computational demands without compromising the optimal regret performance. A worst-case square-root regret lower bound is established if the resource constraints are not adaptively updated during dual optimization, which underscores the critical role of adaptive dual variable update. Comprehensive numerical experiments demonstrate the merits of the proposed algorithm framework.
翻译:本文提出了一种基于对偶的算法框架,用于解决正则化在线资源分配问题,该问题涉及可能非凹的累积奖励、硬性资源约束以及不可分离的正则化项。在自适应更新资源约束的策略下,所提出的框架仅需以一定精度求解经验对偶问题的近似解,即可在局部二阶增长条件下实现最优对数遗憾。令人惊讶的是,通过对对偶目标函数的精细分析,我们能够消除遗憾界中臭名昭著的对数-对数因子。该灵活框架使诸如对偶随机梯度下降等知名且计算高效的算法能够直接应用。此外,本文提出了一种低频重新求解方案,在保持最优遗憾性能的同时显著降低了计算需求。若在对偶优化过程中未自适应更新资源约束,则建立了最坏情况下的平方根遗憾下界,这凸显了自适应对偶变量更新的关键作用。全面的数值实验证明了所提算法框架的优势。