The Fault in Our Recommendations: On the Perils of Optimizing the Measurable

Recommendation systems are widespread, and through customized recommendations, promise to match users with options they will like. To that end, data on engagement is collected and used. Most recommendation systems are ranking-based, where they rank and recommend items based on their predicted engagement. However, the engagement signals are often only a crude proxy for utility, as data on the latter is rarely collected or available. This paper explores the following question: By optimizing for measurable proxies, are recommendation systems at risk of significantly under-delivering on utility? If so, how can one improve utility which is seldom measured? To study these questions, we introduce a model of repeated user consumption in which, at each interaction, users select between an outside option and the best option from a recommendation set. Our model accounts for user heterogeneity, with the majority preferring ``popular'' content, and a minority favoring ``niche'' content. The system initially lacks knowledge of individual user preferences but can learn them through observations of users' choices over time. Our theoretical and numerical analysis demonstrate that optimizing for engagement can lead to significant utility losses. Instead, we propose a utility-aware policy that initially recommends a mix of popular and niche content. As the platform becomes more forward-looking, our utility-aware policy achieves the best of both worlds: near-optimal utility and near-optimal engagement simultaneously. Our study elucidates an important feature of recommendation systems; given the ability to suggest multiple items, one can perform significant exploration without incurring significant reductions in engagement. By recommending high-risk, high-reward items alongside popular items, systems can enhance discovery of high utility items without significantly affecting engagement.

翻译：推荐系统应用广泛，通过个性化推荐承诺为用户匹配其可能喜爱的选项。为此，系统收集并使用用户参与度数据。大多数推荐系统采用基于排序的方法，根据预测的参与度对项目进行排序和推荐。然而，参与度信号通常只是效用的粗略替代指标，因为效用数据鲜少被收集或可得。本文探讨以下问题：通过优化可度量的替代指标，推荐系统是否存在显著降低效用的风险？若如此，如何改善极少被度量的效用？为研究这些问题，我们构建了一个重复用户消费模型，其中每次交互时，用户会在外部选项与推荐集中的最优选项之间进行选择。该模型考虑了用户异质性：多数用户偏好"流行"内容，而少数用户偏好"小众"内容。系统初始时对个体用户偏好未知，但可通过随时间观察用户选择来学习。理论和数值分析表明，优化参与度会导致显著的效用损失。为此，我们提出了一种效用感知策略，该策略初始时混合推荐流行与小众内容。随着平台更具前瞻性，我们的效用感知策略能实现双赢：同时接近最优的效用与最优的参与度。本研究揭示了推荐系统的一个重要特征：由于能够推荐多个项目，系统可在不明显降低参与度的前提下进行显著探索。通过将高风险高回报项目与流行项目共同推荐，系统能在不显著影响参与度的情况下增强高效用项目的发现。