Online learning to rank (OLTR) is a sequential decision-making problem where a learning agent selects an ordered list of items and receives feedback through user clicks. Although potential attacks against OLTR algorithms may cause serious losses in real-world applications, little is known about adversarial attacks on OLTR. This paper studies attack strategies against multiple variants of OLTR. Our first result provides an attack strategy against the UCB algorithm on classical stochastic bandits with binary feedback, which solves the key issues caused by bounded and discrete feedback that previous works can not handle. Building on this result, we design attack algorithms against UCB-based OLTR algorithms in position-based and cascade models. Finally, we propose a general attack strategy against any algorithm under the general click model. Each attack algorithm manipulates the learning agent into choosing the target attack item $T-o(T)$ times, incurring a cumulative cost of $o(T)$. Experiments on synthetic and real data further validate the effectiveness of our proposed attack algorithms.
翻译:在线学习排序(OLTR)是一个序贯决策问题,其中学习智能体选择一个有序项目列表,并通过用户点击接收反馈。尽管针对OLTR算法的潜在攻击可能在现实应用中造成严重损失,但目前关于OLTR的对抗攻击研究尚不充分。本文研究了针对多种OLTR变体的攻击策略。我们的第一个成果提出了一种针对经典随机赌博机中UCB算法的攻击策略,该算法基于二值反馈,解决了前人工作无法处理的有界离散反馈带来的关键问题。在此基础上,我们针对基于位置模型和级联模型的UCB-based OLTR算法设计了攻击算法。最后,我们提出了一种针对通用点击模型下任意算法的通用攻击策略。每种攻击算法均能操控学习智能体选择目标攻击项目$T-o(T)$次,累计成本为$o(T)$。基于合成数据和真实数据的实验进一步验证了所提攻击算法的有效性。