A Comprehensive Survey of Evaluation Techniques for Recommendation Systems

The effectiveness of recommendation systems is pivotal to user engagement and satisfaction in online platforms. As these recommendation systems increasingly influence user choices, their evaluation transcends mere technical performance and becomes central to business success. This paper addresses the multifaceted nature of recommendation system evaluation by introducing a comprehensive suite of metrics, each tailored to capture a distinct aspect of system performance. We discuss similarity metrics that quantify the precision of content-based and collaborative filtering mechanisms, along with candidate generation metrics which measure how well the system identifies a broad yet pertinent range of items. Following this, we delve into predictive metrics that assess the accuracy of forecasted preferences, ranking metrics that evaluate the order in which recommendations are presented, and business metrics that align system performance with economic objectives. Our approach emphasizes the contextual application of these metrics and their interdependencies. In this paper, we identify the strengths and limitations of current evaluation practices and highlight the nuanced trade-offs that emerge when optimizing recommendation systems across different metrics. The paper concludes by proposing a framework for selecting and interpreting these metrics to not only improve system performance but also to advance business goals. This work is to aid researchers and practitioners in critically assessing recommendation systems and fosters the development of more nuanced, effective, and economically viable personalization strategies. Our code is available at GitHub - https://github.com/aryan-jadon/Evaluation-Metrics-for-Recommendation-Systems.

翻译：推荐系统的有效性对于在线平台的用户参与度和满意度至关重要。随着这些推荐系统对用户选择的影响日益加深，其评估已超越单纯的技术性能，成为商业成功的关键。本文通过引入一整套评估指标来应对推荐系统评估的多面性，每个指标旨在捕捉系统性能的不同方面。我们讨论了量化基于内容和协同过滤机制精度的相似度指标，以及衡量系统识别广泛且相关物品集合能力的候选生成指标。随后，我们深入探讨了评估预测偏好准确性的预测指标、评估推荐呈现顺序的排序指标，以及将系统性能与经济目标对齐的业务指标。我们的方法强调这些指标的情境化应用及其相互依赖性。本文识别了当前评估实践的优势与局限性，并突显了在不同指标间优化推荐系统时所出现的细微权衡。最后，本文提出一个框架，用于选择和解读这些指标，以不仅提升系统性能，还能推进商业目标。本研究旨在帮助研究人员和从业者批判性地评估推荐系统，并推动开发更细致、有效且经济可行的个性化策略。我们的代码可在GitHub上获取：https://github.com/aryan-jadon/Evaluation-Metrics-for-Recommendation-Systems。