Efficient learning in multi-armed bandit mechanisms such as pay-per-click (PPC) auctions typically involves three challenges: 1) inducing truthful bidding behavior (incentives), 2) using personalization in the users (context), and 3) circumventing manipulations in click patterns (corruptions). Each of these challenges has been studied orthogonally in the literature; incentives have been addressed by a line of work on truthful multi-armed bandit mechanisms, context has been extensively tackled by contextual bandit algorithms, while corruptions have been discussed via a recent line of work on bandits with adversarial corruptions. Since these challenges co-exist, it is important to understand the robustness of each of these approaches in addressing the other challenges, provide algorithms that can handle all simultaneously, and highlight inherent limitations in this combination. In this work, we show that the most prominent contextual bandit algorithm, $\epsilon$-greedy can be extended to handle the challenges introduced by strategic arms in the contextual multi-arm bandit mechanism setting. We further show that $\epsilon$-greedy is inherently robust to adversarial data corruption attacks and achieves performance that degrades linearly with the amount of corruption.
翻译:在多臂赌博机机制(如按点击付费拍卖)中实现高效学习通常面临三大挑战:1)诱导真实报价行为(激励机制),2)利用用户个性化信息(上下文),3)规避点击模式中的操控行为(数据污染)。现有研究对每项挑战均进行了独立探讨:激励机制通过真实多臂赌博机机制的相关工作得以解决;上下文信息由上下文赌博机算法系统研究;数据污染问题则通过近期关于对抗性污染赌博机的研究工作进行讨论。由于这些挑战在实际场景中并存,理解各类方法应对其他挑战的鲁棒性、设计能同时处理所有挑战的算法、并揭示三者结合时的固有限制具有重要研究意义。本文证明,主流上下文赌博机算法ε-贪心算法可扩展至处理策略型臂在多臂赌博机机制中带来的挑战。进一步研究发现,ε-贪心算法对对抗性数据污染攻击具有内在鲁棒性,其性能随污染量呈现线性退化特征。