FairRec: Fairness Testing for Deep Recommender Systems

Deep learning-based recommender systems (DRSs) are increasingly and widely deployed in the industry, which brings significant convenience to people's daily life in different ways. However, recommender systems are also shown to suffer from multiple issues,e.g., the echo chamber and the Matthew effect, of which the notation of "fairness" plays a core role.While many fairness notations and corresponding fairness testing approaches have been developed for traditional deep classification models, they are essentially hardly applicable to DRSs. One major difficulty is that there still lacks a systematic understanding and mapping between the existing fairness notations and the diverse testing requirements for deep recommender systems, not to mention further testing or debugging activities. To address the gap, we propose FairRec, a unified framework that supports fairness testing of DRSs from multiple customized perspectives, e.g., model utility, item diversity, item popularity, etc. We also propose a novel, efficient search-based testing approach to tackle the new challenge, i.e., double-ended discrete particle swarm optimization (DPSO) algorithm, to effectively search for hidden fairness issues in the form of certain disadvantaged groups from a vast number of candidate groups. Given the testing report, by adopting a simple re-ranking mitigation strategy on these identified disadvantaged groups, we show that the fairness of DRSs can be significantly improved. We conducted extensive experiments on multiple industry-level DRSs adopted by leading companies. The results confirm that FairRec is effective and efficient in identifying the deeply hidden fairness issues, e.g., achieving 95% testing accuracy with half to 1/8 time.

翻译：基于深度学习的推荐系统正日益广泛地部署于工业界，以不同方式为人们的日常生活带来显著便利。然而，推荐系统也被证明存在多重问题，例如信息茧房和马太效应，其中“公平性”概念发挥着核心作用。尽管针对传统深度分类模型已开发出诸多公平性定义及相应的公平性测试方法，但这些方法本质上难以适用于深度推荐系统。主要难点在于，现有公平性定义与深度推荐系统的多样化测试需求之间仍缺乏系统性理解与映射关系，更遑论进一步的测试或调试活动。为弥补这一空白，我们提出FairRec——一个支持从多个定制化视角（如模型效用、物品多样性、物品流行度等）对深度推荐系统进行公平性测试的统一框架。我们还提出一种新颖且高效的基于搜索的测试方法，即双端离散粒子群优化算法，以有效应对新挑战：从海量候选组中搜索以特定弱势群体形式存在的隐藏公平性问题。基于测试报告，通过对这些已识别的弱势群体采用简单的重排序缓解策略，我们证明深度推荐系统的公平性可获得显著提升。我们在头部企业采用的多个工业级深度推荐系统上开展了大量实验，结果证实FairRec能高效识别深度隐藏的公平性问题，例如在节省一半至八分之一时间的情况下达到95%的测试准确率。