User interactions in online recommendation platforms create interdependencies among content creators: feedback on one creator's content influences the system's learning and, in turn, the exposure of other creators' contents. To analyze incentives in such settings, we model collaboration as a multi-agent stochastic linear bandit problem with a transferable utility (TU) cooperative game formulation, where a coalition's value equals the negative sum of its members' cumulative regrets. We show that, for identical (homogenous) agents with fixed action sets, the induced TU game is convex under mild algorithmic conditions, implying a non-empty core that contains the Shapley value and ensures both stability and fairness. For heterogeneous agents, the game still admits a non-empty core, though convexity and Shapley value core-membership are no longer guaranteed. To address this, we propose a simple regret-based payout rule that satisfies three out of the four Shapley axioms and also lies in the core. Experiments on MovieLens-100k dataset illustrate when the empirical payout aligns with -- and diverges from -- the Shapley fairness across different settings and algorithms.
翻译:在线推荐平台中的用户交互产生了内容创作者之间的相互依赖关系:对某个创作者内容的反馈会影响系统的学习,进而影响其他创作者内容的曝光。为分析此类环境下的激励机制,我们将协作建模为一个具有可转移效用(TU)合作博弈形式的多智能体随机线性臂老虎机问题,其中联盟的价值等于其成员累积遗憾的负和。我们证明,对于具有固定动作集的同质智能体,在温和的算法条件下,诱导的TU博弈是凸的,这意味着存在包含沙普利值的非空核,同时确保稳定性和公平性。对于异质智能体,该博弈仍存在非空核,但凸性和沙普利值的核成员身份不再得到保证。为此,我们提出了一种基于遗憾的简单支付规则,该规则满足沙普利四条公理中的三条,且位于核中。在MovieLens-100k数据集上的实验展示了在不同设置和算法下,经验支付何时与沙普利公平性一致或偏离。