Online platforms in the Internet Economy commonly incorporate recommender systems that recommend products (or "arms") to users (or "agents"). A key challenge in this domain arises from myopic agents who are naturally incentivized to exploit by choosing the optimal arm based on current information, rather than exploring various alternatives to gather information that benefits the collective. We propose a novel recommender system that aligns with agents' incentives while achieving asymptotically optimal performance, as measured by regret in repeated interactions. Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets, where the interactions of agents and arms are facilitated by recommender systems on online platforms. This model incorporates incentive constraints induced by agents' opportunity costs. In scenarios where opportunity costs are known to the platform, we show the existence of an incentive-compatible recommendation algorithm. This algorithm pools recommendations between a genuinely good arm and an unknown arm using a randomized and adaptive strategy. Moreover, when these opportunity costs are unknown, we introduce an algorithm that randomly pools recommendations across all arms, utilizing the cumulative loss from each arm as feedback for strategic exploration. We demonstrate that both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation. All code for using the proposed algorithms and reproducing results is made available on GitHub.
翻译:互联网经济中的在线平台通常包含向用户(或"智能体")推荐产品(或"臂")的推荐系统。该领域的一个关键挑战源于短视的智能体:他们天然倾向于利用当前信息选择最优臂进行开发,而非探索不同选项以收集对集体有益的信息。我们提出了一种新型推荐系统,该系统在满足智能体激励的同时,通过重复交互中的遗憾度衡量,实现了渐近最优性能。我们的框架将这种激励感知系统建模为双边市场中的多智能体赌博机问题,其中智能体与臂的交互由在线平台上的推荐系统促成。该模型纳入了由智能体机会成本引致的激励约束。在平台已知机会成本的场景下,我们证明了激励相容推荐算法的存在性。该算法通过随机化自适应策略,将推荐在真正优质臂与未知臂之间进行混合。此外,当机会成本未知时,我们提出了一种在所有臂间随机混合推荐的算法,利用各臂的累积损失作为策略探索的反馈。我们证明两种算法均满足事后公平准则,该准则可保护智能体免遭过度开发。所提算法的使用代码及结果复现资源已发布于GitHub平台。