Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get good (approximation) algorithms if these distributions are unknown, and the algorithm needs to learn them through repeated interactions? In this paper, we resolve this question for a large class of "monotone" stochastic problems, by providing a generic online learning algorithm with $\sqrt{T \log T}$ regret relative to the best approximation algorithm (under known distributions). Importantly, our online algorithm works in a semi-bandit setting, where in each period, the algorithm only observes samples from the r.v.s that were actually probed. Our framework applies to several fundamental problems in stochastic optimization such as prophet inequality, Pandora's box, stochastic knapsack, stochastic matchings and stochastic submodular optimization.
翻译:随机优化是一种广泛使用的处理不确定性的优化方法,其中不确定输入参数由随机变量建模。该领域已针对若干基本问题获得了精确或近似算法。然而,这种方法的一个显著限制是它需要完全掌握底层概率分布。如果这些分布未知且算法需要通过重复交互来学习它们,我们是否仍能获得良好的(近似)算法?在本文中,我们通过提供一个通用的在线学习算法,针对一大类"单调"随机问题解决了这一问题,该算法相对于(已知分布下的)最优近似算法具有$\sqrt{T \log T}$的遗憾值。重要的是,我们的在线算法在半强盗设置下工作,在每个周期中,算法仅观测到实际探测的随机变量样本。我们的框架适用于随机优化中的若干基本问题,例如先知不等式、潘多拉魔盒、随机背包、随机匹配和随机子模优化。