Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance, even offering a constant regret, but requires solving a large number of LPs, which could be computationally expensive. In contrast, LP-free algorithm only requires first-order computations but induces a worse performance, lacking a constant regret bound. In this work, we study the case where the inputs are drawn from an unknown finite-support distribution, and bridge the gap between these two extremes by proposing an algorithm that achieves a constant regret while solving LPs only $O(\log\log T)$ times over the time horizon $T$. Moreover, when we are allowed to solve LPs only $M$ times, we propose an algorithm that can guarantee an $O\left(T^{(1/2+\epsilon)^{M-1}}\right)$ regret. Furthermore, when the arrival probabilities are known at the beginning, our algorithm can guarantee a constant regret by solving LPs $O(\log\log T)$ times, and an $O\left(T^{(1/2+\epsilon)^{M}}\right)$ regret by solving LPs only $M$ times. Numerical experiments are conducted to demonstrate the efficiency of the proposed algorithms.
翻译:在线线性规划(OLP)因其在在线拍卖、网络收益管理和广告等领域的广泛应用而受到研究者和从业者的广泛关注。现有的OLP算法主要分为两类:基于线性规划(LP)的算法和无LP算法。前者通常能保证更好的性能,甚至提供常数遗憾界,但需要求解大量线性规划问题,计算开销可能较大。相比之下,无LP算法仅需一阶计算,但性能较差,缺乏常数遗憾界保证。本文研究输入数据来自未知有限支撑分布的情形,通过提出一种在时间范围$T$内仅需求解$O(\log\log T)$次线性规划即可达到常数遗憾界的算法,弥合了这两类算法之间的差距。此外,当仅允许求解$M$次线性规划时,我们提出了一种能保证$O\left(T^{(1/2+\epsilon)^{M-1}}\right)$遗憾界的算法。进一步地,当到达概率在开始时已知时,我们的算法可通过求解$O(\log\log T)$次线性规划保证常数遗憾界,而仅求解$M$次线性规划时可保证$O\left(T^{(1/2+\epsilon)^{M}}\right)$遗憾界。数值实验验证了所提算法的有效性。