Motivated by the phenomenon of strategic agents gaming a recommender system to maximize the number of times they are recommended to users, we study a strategic variant of the linear contextual bandit problem, where the arms can strategically misreport privately observed contexts to the learner. We treat the algorithm design problem as one of mechanism design under uncertainty and propose the Optimistic Grim Trigger Mechanism (OptGTM) that incentivizes the agents (i.e., arms) to report their contexts truthfully while simultaneously minimizing regret. We also show that failing to account for the strategic nature of the agents results in linear regret. However, a trade-off between mechanism design and regret minimization appears to be unavoidable. More broadly, this work aims to provide insight into the intersection of online learning and mechanism design.
翻译:受推荐系统中战略主体通过博弈以最大化其被推荐给用户次数的现象启发,我们研究了一个线性上下文赌博机问题的战略变体,其中臂(主体)可以战略性地向学习者误报其私有观测到的上下文。我们将算法设计问题视为不确定性下的机制设计问题,提出了乐观冷酷触发机制(OptGTM),该机制在激励代理(即臂)如实报告其上下文的同时最小化遗憾。我们还证明了,若未能考虑主体的战略性质将导致线性遗憾。然而,机制设计与遗憾最小化之间的权衡似乎是不可避免的。更广泛地,这项工作旨在为在线学习与机制设计的交叉领域提供见解。