We study the online decision making problem (ODMP) as a natural generalization of online linear programming. In ODMP, a single decision maker undertakes a sequence of decisions over $T$ time steps. At each time step, the decision maker makes a locally feasible decision based on information available up to that point. The objective is to maximize the accumulated reward while satisfying some convex global constraints called goal constraints. The decision made at each step results in an $m$-dimensional vector that represents the contribution of this local decision to the goal constraints. In the online setting, these goal constraints are soft constraints that can be violated moderately. To handle potential nonconvexity and nonlinearity in ODMP, we propose a Fenchel dual-based online algorithm. At each time step, the algorithm requires solving a potentially nonconvex optimization problem over the local feasible set and a convex optimization problem over the goal set. Under certain stochastic input models, we show that the algorithm achieves $O(\sqrt{mT})$ goal constraint violation deterministically, and $\tilde{O}(\sqrt{mT})$ regret in expected reward. Numerical experiments on an online knapsack problem and an assortment optimization problem are conducted to demonstrate the potential of our proposed online algorithm.
翻译:本文研究在线决策制定问题(ODMP),该问题可视为在线线性规划的自然推广。在ODMP中,单一决策者在$T$个时间步长上执行一系列决策。每个时间步,决策者根据当前可获得的信息做出局部可行决策。目标是在满足称为目标约束的凸全局约束条件下,最大化累积奖励。每一步决策会产生一个$m$维向量,表示该局部决策对目标约束的贡献。在线设置中,这些目标约束为软约束,允许适度违反。为处理ODMP中潜在的非凸性和非线性,我们提出一种基于Fenchel对偶的在线算法。每个时间步,该算法需要在局部可行集上求解一个潜在的非凸优化问题,并在目标集上求解一个凸优化问题。在某些随机输入模型下,我们证明该算法能以确定性方式实现$O(\sqrt{mT})$的目标约束违反,并在期望奖励上达到$\tilde{O}(\sqrt{mT})$的遗憾度。通过对在线背包问题和品类优化问题的数值实验,验证了所提在线算法的潜力。