Algorithmic \emph{replicability} has recently been introduced to address the need for reproducible experiments in machine learning. A \emph{replicable online learning} algorithm is one that takes the same sequence of decisions across different executions in the same environment, with high probability. We initiate the study of algorithmic replicability in \emph{constrained} MAB problems, where a learner interacts with an unknown stochastic environment for $T$ rounds, seeking not only to maximize reward but also to satisfy multiple constraints. Our main result is that replicability can be achieved in constrained MABs. Specifically, we design replicable algorithms whose regret and constraint violation match those of non-replicable ones in terms of $T$. As a key step toward these guarantees, we develop the first replicable UCB-like algorithm for \emph{unconstrained} MABs, showing that algorithms that employ the optimism in-the-face-of-uncertainty principle can be replicable, a result that we believe is of independent interest.
翻译:算法可复现性最近被引入,以解决机器学习中可重复实验的需求。可复现的在线学习算法是指在相同环境下,以高概率在不同执行过程中做出相同决策序列的算法。我们首次在约束多臂赌博机问题中研究算法可复现性,其中学习者在$T$轮次中与未知随机环境交互,不仅追求奖励最大化,还需满足多重约束。我们的主要结果表明,在约束多臂赌博机中可以实现可复现性。具体而言,我们设计了可复现算法,其遗憾和约束违反在$T$的阶数上与非可复现算法相当。为实现这些保证,我们开发了首个针对无约束多臂赌博机的可复现类UCB算法,证明了采用面对不确定性乐观原则的算法可以具备可复现性,这一结果我们认为具有独立的研究价值。