We propose and design recommendation systems that incentivize efficient exploration. Agents arrive sequentially, choose actions and receive rewards, drawn from fixed but unknown action-specific distributions. The recommendation system presents each agent with actions and rewards from a subsequence of past agents, chosen ex ante. Thus, the agents engage in sequential social learning, moderated by these subsequences. We asymptotically attain optimal regret rate for exploration, using a flexible frequentist behavioral model and mitigating rationality and commitment assumptions inherent in prior work. We suggest three components of effective recommendation systems: independent focus groups, group aggregators, and interlaced information structures.
翻译:本文提出并设计了一种激励高效探索的推荐系统。智能体按序到达,选择行动并获得奖励,这些奖励来自固定但未知的行动特定分布。推荐系统向每个智能体呈现来自预先选定的历史智能体子序列的行动和奖励。因此,智能体通过这些子序列进行调节的序贯社会学习。我们采用灵活的频率主义行为模型,减轻了先前工作中固有的理性与承诺假设,从而在探索方面渐近地达到了最优遗憾率。我们提出了有效推荐系统的三个组成部分:独立焦点小组、群体聚合器以及交错信息结构。