Recommendation systems are dynamic economic systems that balance the needs of multiple stakeholders. A recent line of work studies incentives from the content providers' point of view. Content providers, e.g., vloggers and bloggers, contribute fresh content and rely on user engagement to create revenue and finance their operations. In this work, we propose a contextual multi-armed bandit setting to model the dependency of content providers on exposure. In our model, the system receives a user context in every round and has to select one of the arms. Every arm is a content provider who must receive a minimum number of pulls every fixed time period (e.g., a month) to remain viable in later rounds; otherwise, the arm departs and is no longer available. The system aims to maximize the users' (content consumers) welfare. To that end, it should learn which arms are vital and ensure they remain viable by subsidizing arm pulls if needed. We develop algorithms with sub-linear regret, as well as a lower bound that demonstrates that our algorithms are optimal up to logarithmic factors.
翻译:推荐系统是动态经济系统,需要平衡多方利益相关者的需求。近期一系列研究从内容提供者的角度探究激励问题。内容提供者(如视频博主和博客作者)通过贡献新鲜内容,依赖用户参与度来创造收入、维持运营。本文提出一种基于上下文的多臂老虎机设定,用于建模内容提供者对曝光的依赖性。在该模型中,系统每轮接收用户上下文信息,需从若干臂中选择一个。每个臂对应一名内容提供者,其必须在每个固定时间段(如一个月)内获得最低次数的拉取操作,才能在后继轮次中维持生存;否则该臂将离开系统且不可再用。系统旨在最大化用户(内容消费者)的福利。为此,系统需学习哪些臂至关重要,并在必要时通过补贴拉取操作确保其生存能力。我们开发了具有次线性遗憾的算法,并证明了下界——该下界表明我们的算法在忽略对数因子的情况下达到最优。