The $K$-armed dueling bandits problem, where the feedback is in the form of noisy pairwise preferences, has been widely studied due its applications in information retrieval, recommendation systems, etc. Motivated by concerns that user preferences/tastes can evolve over time, we consider the problem of dueling bandits with distribution shifts. Specifically, we study the recent notion of significant shifts (Suk and Kpotufe, 2022), and ask whether one can design an adaptive algorithm for the dueling problem with $O(\sqrt{K\tilde{L}T})$ dynamic regret, where $\tilde{L}$ is the (unknown) number of significant shifts in preferences. We show that the answer to this question depends on the properties of underlying preference distributions. Firstly, we give an impossibility result that rules out any algorithm with $O(\sqrt{K\tilde{L}T})$ dynamic regret under the well-studied Condorcet and SST classes of preference distributions. Secondly, we show that $\text{SST} \cap \text{STI}$ is the largest amongst popular classes of preference distributions where it is possible to design such an algorithm. Overall, our results provides an almost complete resolution of the above question for the hierarchy of distribution classes.
翻译:$K$臂对抗性赌博机问题(反馈以含噪成对偏好形式呈现)因在信息检索、推荐系统等领域的应用而得到广泛研究。鉴于用户偏好/品味可能随时间演变,我们考虑带分布偏移的对抗性赌博机问题。具体而言,我们研究显著偏移(Suk and Kpotufe, 2022)的最新定义,并探讨能否为对抗性赌博机问题设计出具有$O(\sqrt{K\tilde{L}T})$动态遗憾的自适应算法,其中$\tilde{L}$是偏好中未知的显著偏移次数。研究表明,该问题的答案取决于底层偏好分布的性质。首先,我们给出一个不可能性结果,表明在已广泛研究的Condorcet和SST类偏好分布下,不存在任何算法能达到$O(\sqrt{K\tilde{L}T})$动态遗憾。其次,我们证明在可设计此类算法的常见偏好分布类中,$\text{SST} \cap \text{STI}$是最大的子类。总体而言,我们的结果对分布类层次结构中的上述问题提供了近乎完整的解答。