In this paper, we broaden the horizon of online convex optimization (OCO), and consider multi-objective OCO, where there are $K$ distinct loss function sequences, and an algorithm has to choose its action at time $t$, before the $K$ loss functions at time $t$ are revealed. To capture the tradeoff between tracking the $K$ different sequences, we consider the {\it min-max} regret, where the benchmark (optimal offline algorithm) takes a static action across all time slots that minimizes the maximum of the total loss (summed across time slots) incurred by each of the $K$ sequences. An online algorithm is allowed to change its action across time slots, and its {\it min-max} regret is defined as the difference between its {\it min-max} cost and that of the benchmark. The {\it min-max} regret is a stringent performance measure and an algorithm with small regret needs to `track' all loss functions simultaneously. We first show that with adversarial input, {\it min-max} regret scales linearly with the time horizon $T$ for any online algorithm. Consequently, we consider a stochastic i.i.d. input model where all loss functions are i.i.d. generated from an unknown joint distribution and propose a simple algorithm that combines the well-known {\it Hedge} and online gradient descent (OGD) and show via a remarkably simple proof that its expected {\it min-max} regret is $O(\sqrt{T \log (T K)})$. Analogous results are also derived for Martingale difference and Markov input models.
翻译:本文拓展了在线凸优化(OCO)的研究范畴,提出多目标在线凸优化问题。在该问题中,存在K个不同的损失函数序列,算法必须在第t时刻的K个损失函数被揭示之前,选择其在该时刻的动作。为刻画跟踪这K个不同序列之间的权衡关系,我们引入最小-最大遗憾作为性能度量,其中基准算法(最优离线算法)采用跨所有时隙的静态动作,该动作最小化每个K序列所承受的总损失(跨时隙求和)的最大值。在线算法允许随时间改变其动作,其最小-最大遗憾定义为算法的最小-最大成本与基准算法成本之差。最小-最大遗憾是一种严格的性能度量标准,具有较小遗憾的算法需要同时“跟踪”所有损失函数。我们首先证明在对抗性输入下,任何在线算法的最小-最大遗憾均随时间范围T线性增长。因此,我们考虑随机独立同分布输入模型,其中所有损失函数均从未知联合分布中独立同分布生成,并提出一种结合经典Hedge算法与在线梯度下降(OGD)的简单算法。通过一个异常简洁的证明,我们表明该算法的期望最小-最大遗憾为O(√(T log(TK)))。类似结果也在鞅差序列与马尔可夫输入模型中得以推导。