Current practice for evaluating recommender systems typically focuses on point estimates of user-oriented effectiveness metrics or business metrics, sometimes combined with additional metrics for considerations such as diversity and novelty. In this paper, we argue for the need for researchers and practitioners to attend more closely to various distributions that arise from a recommender system (or other information access system) and the sources of uncertainty that lead to these distributions. One immediate implication of our argument is that both researchers and practitioners must report and examine more thoroughly the distribution of utility between and within different stakeholder groups. However, distributions of various forms arise in many more aspects of the recommender systems experimental process, and distributional thinking has substantial ramifications for how we design, evaluate, and present recommender systems evaluation and research results. Leveraging and emphasizing distributions in the evaluation of recommender systems is a necessary step to ensure that the systems provide appropriate and equitably-distributed benefit to the people they affect.
翻译:当前推荐系统评估的实践通常侧重于面向用户的有效性指标或业务指标的点估计,有时会结合多样性、新颖性等附加指标。本文论证了研究人员和实践者需要更加关注推荐系统(或其他信息访问系统)产生的各种分布及其背后的不确定性来源。这一观点的一个直接启示是,研究人员和实践者必须更全面地报告并考察不同利益相关者群体之间及群体内部的效用分布。然而,各种形式的分布存在于推荐系统实验过程的更多方面,分布思维对我们设计、评估及呈现推荐系统评估与研究成果的方式具有重大影响。在推荐系统评估中利用并强调分布,是确保系统为受其影响的人群提供适当且公平分配的利益的关键步骤。