Recommendation systems are dynamic economic systems that balance the needs of multiple stakeholders. A recent line of work studies incentives from the content providers' point of view. Content providers, e.g., vloggers and bloggers, contribute fresh content and rely on user engagement to create revenue and finance their operations. In this work, we propose a contextual multi-armed bandit setting to model the dependency of content providers on exposure. In our model, the system receives a user context in every round and has to select one of the arms. Every arm is a content provider who must receive a minimum number of pulls every fixed time period (e.g., a month) to remain viable in later rounds; otherwise, the arm departs and is no longer available. The system aims to maximize the users' (content consumers) welfare. To that end, it should learn which arms are vital and ensure they remain viable by subsidizing arm pulls if needed. We develop algorithms with sub-linear regret, as well as a lower bound that demonstrates that our algorithms are optimal up to logarithmic factors.
翻译:推荐系统是平衡多方利益相关者需求的动态经济系统。近期一系列研究从内容提供者视角探讨激励机制。内容提供者(如视频博主、博主等)贡献新鲜内容,依靠用户参与创造收入以维持运营。本文提出一种上下文多臂老虎机框架,用以建模内容提供者对曝光的依赖性。在该模型中,系统每轮接收用户上下文并需选择一条臂。每条臂对应一个内容提供者,其必须在每个固定时段(如每月)获得最低次数展示,方能在后续轮次中保持存活;否则该臂将离开且不再可用。系统目标为最大化用户(内容消费者)福利。为此,系统需学习哪些臂至关重要,并在必要时通过补贴展示次数确保其生存能力。我们开发了具有次线性遗憾的算法,并证明了该算法在忽略对数因子情况下达到最优的下界。