We consider the problem of decision-making under uncertainty in an environment with safety constraints. Many business and industrial applications rely on real-time optimization to improve key performance indicators. In the case of unknown characteristics, real-time optimization becomes challenging, particularly because of the satisfaction of safety constraints. We propose the ARTEO algorithm, where we cast multi-armed bandits as a mathematical programming problem subject to safety constraints and learn the unknown characteristics through exploration while optimizing the targets. We quantify the uncertainty in unknown characteristics by using Gaussian processes and incorporate it into the cost function as a contribution which drives exploration. We adaptively control the size of this contribution in accordance with the requirements of the environment. We guarantee the safety of our algorithm with a high probability through confidence bounds constructed under the regularity assumptions of Gaussian processes. We demonstrate the safety and efficiency of our approach with two case studies: optimization of electric motor current and real-time bidding problems. We further evaluate the performance of ARTEO compared to a safe variant of upper confidence bound based algorithms. ARTEO achieves less cumulative regret with accurate and safe decisions.
翻译:我们研究了在具有安全约束的环境中,于不确定性条件下进行决策的问题。许多商业和工业应用依赖实时优化来提升关键性能指标。在未知特性的情况下,实时优化面临挑战,尤其是满足安全约束方面。我们提出了ARTEO算法,该算法将多臂老虎机问题建模为带有安全约束的数学规划问题,在优化目标的同时通过探索学习未知特性。我们利用高斯过程量化未知特性中的不确定性,并将其作为驱动探索的贡献项纳入成本函数。我们根据环境需求自适应控制该贡献项的规模。通过在高斯过程正则性假设下构建置信界,我们保证了算法的高概率安全性。我们通过两个案例研究验证了该方法的安全性与效率:电机电流优化和实时竞价问题。我们进一步将ARTEO与基于上置信界算法的安全变体进行性能对比。实验表明,ARTEO能以更低的累积遗憾实现准确且安全的决策。