We propose and design recommendation systems that incentivize efficient exploration. Agents arrive sequentially, choose actions and receive rewards, drawn from fixed but unknown action-specific distributions. The recommendation system presents each agent with actions and rewards from a subsequence of past agents, chosen ex ante. Thus, the agents engage in sequential social learning, moderated by these subsequences. We asymptotically attain optimal regret rate for exploration, using a flexible frequentist behavioral model and mitigating rationality and commitment assumptions inherent in prior work. We suggest three components of effective recommendation systems: independent focus groups, group aggregators, and interlaced information structures.
翻译:我们提出并设计了能够激励高效探索的推荐系统。主体依次到达,选择行动并获得奖励,这些奖励来自固定但未知的行动特定分布。推荐系统向每个主体展示从过往主体子序列中选择的行动和奖励,该子序列是事前确定的。因此,主体参与由这些子序列调节的序列社会学习。我们采用灵活的频率主义行为模型,在渐近意义上实现了探索的最优遗憾率,并减轻了先前研究中固有的理性假设和承诺假设。我们提出了有效推荐系统的三个组成部分:独立焦点小组、群体聚合器和交错信息结构。