This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms. Our contributions can be summarized in two key aspects. Firstly, we furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case. Secondly, we introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models initially introduced by \citet{wen:2001}. These algorithms offer users the flexibility to fine-tune the model extensively, as they can be implemented efficiently due to their closed-form sampling distribution probabilities. To demonstrate the practical implementation of our algorithms, we present numerical experiments, focusing on the stochastic bandit case.
翻译:本文建立了一类离散选择模型与在线学习及多臂赌博机算法领域之间的联系。我们的贡献主要体现在两个方面:首先,我们为广泛的算法系列(包括作为特例的Exp3算法)提供了亚线性遗憾界;其次,我们引入了一种新型的对抗式多臂赌博机算法系列,该系列受\citet{wen:2001}最初提出的广义嵌套Logit模型启发。这些算法允许用户对模型进行广泛微调,且由于其封闭形式的采样分布概率,能够高效实现。为展示算法的实际应用,我们进行了数值实验,重点关注随机赌博机场景。