CFaiRLLM: Consumer Fairness Evaluation in Large-Language Model Recommender System

In the evolving landscape of recommender systems, the integration of Large Language Models (LLMs) such as ChatGPT marks a new era, introducing the concept of Recommendation via LLM (RecLLM). While these advancements promise unprecedented personalization and efficiency, they also bring to the fore critical concerns regarding fairness, particularly in how recommendations might inadvertently perpetuate or amplify biases associated with sensitive user attributes. In order to address these concerns, our study introduces a comprehensive evaluation framework, CFaiRLLM, aimed at evaluating (and thereby mitigating) biases on the consumer side within RecLLMs. Our research methodically assesses the fairness of RecLLMs by examining how recommendations might vary with the inclusion of sensitive attributes such as gender, age, and their intersections, through both similarity alignment and true preference alignment. By analyzing recommendations generated under different conditions-including the use of sensitive attributes in user prompts-our framework identifies potential biases in the recommendations provided. A key part of our study involves exploring how different detailed strategies for constructing user profiles (random, top-rated, recent) impact the alignment between recommendations made without consideration of sensitive attributes and those that are sensitive-attribute-aware, highlighting the bias mechanisms within RecLLMs. The findings in our study highlight notable disparities in the fairness of recommendations, particularly when sensitive attributes are integrated into the recommendation process, either individually or in combination. The analysis demonstrates that the choice of user profile sampling strategy plays a significant role in affecting fairness outcomes, highlighting the complexity of achieving fair recommendations in the era of LLMs.

翻译：在推荐系统不断发展的背景下，ChatGPT等大语言模型的集成标志着新纪元的到来，催生了基于大语言模型的推荐（RecLLM）概念。尽管这些进步有望带来前所未有的个性化和效率提升，但它们也引发了关于公平性的关键担忧，尤其是推荐可能无意中延续或放大与用户敏感属性相关的偏见。为解决这些问题，本研究提出了一个综合评估框架CFaiRLLM，旨在评估（进而缓解）RecLLM中消费者层面的偏见。我们的研究方法系统性地评估了RecLLM的公平性，通过相似性对齐和真实偏好对齐，考察推荐如何因性别、年龄及其交叉等敏感属性的加入而变化。通过分析在不同条件下（包括在用户提示中使用敏感属性）生成的推荐，该框架识别了推荐中存在的潜在偏见。研究的关键部分在于探索构建用户资料的不同详细策略（随机、高评分、近期）如何影响不考虑敏感属性与考虑敏感属性的推荐之间的对齐，揭示了RecLLM中的偏见机制。研究结果显示了推荐公平性的显著差异，尤其是当敏感属性单独或组合融入推荐过程时。分析表明，用户资料抽样策略的选择对公平性结果有重要影响，凸显了大语言模型时代实现公平推荐的复杂性。