An online decision-making problem is a learning problem in which a player repeatedly makes decisions in order to minimize the long-term loss. These problems that emerge in applications often have nonlinear combinatorial objective functions, and developing algorithms for such problems has attracted considerable attention. An existing general framework for dealing with such objective functions is the online submodular minimization. However, practical problems are often out of the scope of this framework, since the domain of a submodular function is limited to a subset of the unit hypercube. To manage this limitation of the existing framework, we in this paper introduce the online $\mathrm{L}^{\natural}$-convex minimization, where an $\mathrm{L}^{\natural}$-convex function generalizes a submodular function so that the domain is a subset of the integer lattice. We propose computationally efficient algorithms for the online $\mathrm{L}^{\natural}$-convex function minimization in two major settings: the full information and the bandit settings. We analyze the regrets of these algorithms and show in particular that our algorithm for the full information setting obtains a tight regret bound up to a constant factor. We also demonstrate several motivating examples that illustrate the usefulness of the online $\mathrm{L}^{\natural}$-convex minimization.
翻译:在线决策问题是一种学习问题,其中玩家反复做出决策以最小化长期损失。实际应用中出现的这些问题通常具有非线性组合目标函数,开发针对这类问题的算法已引起广泛关注。现有处理此类目标函数的通用框架是在线子模最小化。然而,由于子模函数的定义域仅限于单位超立方体的子集,实际问题往往超出该框架的适用范围。为克服现有框架的这一局限性,本文引入在线$\mathrm{L}^{\natural}$-凸最小化,其中$\mathrm{L}^{\natural}$-凸函数将子模函数推广至定义域为整数格点子集的情形。我们针对两种主要设定:完全信息设定与赌博机设定,提出了计算高效的在线$\mathrm{L}^{\natural}$-凸函数最小化算法。我们分析了这些算法的遗憾值,并特别证明了在完全信息设定下,我们的算法达到了紧致遗憾界(至多相差常数因子)。同时,我们通过多个激励性示例展示了在线$\mathrm{L}^{\natural}$-凸最小化的实用性。