We propose and design recommendation systems that incentivize efficient exploration. Agents arrive sequentially, choose actions and receive rewards, drawn from fixed but unknown action-specific distributions. The recommendation system presents each agent with actions and rewards from a subsequence of past agents, chosen ex ante. Thus, the agents engage in sequential social learning, moderated by these subsequences. We asymptotically attain optimal regret rate for exploration, using a flexible frequentist behavioral model and mitigating rationality and commitment assumptions inherent in prior work. We suggest three components of effective recommendation systems: independent focus groups, group aggregators, and interlaced information structures.
翻译:我们提出并设计了能够激励高效探索的推荐系统。智能体依次到达,选择行动并获得来自固定但未知的特定行动分布奖励。推荐系统向每个智能体展示过去智能体子序列中的行动和奖励,该子序列是事前选择的。因此,智能体在这些子序列的调节下进行序贯社会学习。我们采用灵活的频数主义行为模型,在渐近意义上达到了最优探索遗憾率,并减轻了先前工作中固有的理性假设和承诺假设。我们提出了有效推荐系统的三个组成部分:独立焦点小组、组聚合器以及交错信息结构。