Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.
翻译:大多数关于序列学习的研究假设有一个固定的动作集始终可用。然而,在实际应用中,动作可能涉及从可能不时损坏的传感器、可能被封锁的路段或库存缺货的商品中选取子集。本文研究能够处理此类不可靠复合动作随机可用性的学习算法。我们基于“跟随扰动领导者”预测方法,针对向学习者提供不同反馈的几种学习设置,提出并分析了相应算法。我们的算法依赖于一种新颖的损失估计技术,称为“计数休眠次数”。我们为先前研究的完全信息和(半)赌博机设置,以及介于两者之间的自然中间点(我们称之为受限信息设置)提供了算法的遗憾界。我们结果的一个特殊意义是,对于随机可用性的“休眠赌博机问题”,我们显著改进了已知高效算法的最佳性能保证。最后,我们通过实证评估了算法,并展示了它们相较于已知方法的改进。