Understanding Biases in ChatGPT-based Recommender Systems: Provider Fairness, Temporal Stability, and Recency

This study explores the nuanced capabilities and inherent biases of Recommender Systems using Large Language Models (RecLLMs), with a focus on ChatGPT-based systems. It studies into the contrasting behaviors of generative models and traditional collaborative filtering models in movie recommendations. The research primarily investigates prompt design strategies and their impact on various aspects of recommendation quality, including accuracy, provider fairness, diversity, stability, genre dominance, and temporal freshness (recency). Our experimental analysis reveals that the introduction of specific 'system roles' and 'prompt strategies' in RecLLMs significantly influences their performance. For instance, role-based prompts enhance fairness and diversity in recommendations, mitigating popularity bias. We find that while GPT-based models do not always match the performance of CF baselines, they exhibit a unique tendency to recommend newer and more diverse movie genres. Notably, GPT-based models tend to recommend more recent films, particularly those released post-2000, and show a preference for genres like \sq{Drama} and Comedy, and Romance (compared to CF Action, Adventure) presumably due to the RecLLMs' training on varied data sets, which allows them to capture recent trends and discussions more effectively than CF models. Interestingly, our results demonstrate that the 'Simple' and 'Chain of Thought (COT)' paradigms yield the highest accuracy. These findings imply the potential of combining these strategies with scenarios that favor more recent content, thereby offering a more balanced and up-to-date recommendation experience. This study contributes significantly to the understanding of emerging RecLLMs, particularly in the context of harms and biases within these systems.

翻译：本研究探讨了使用大语言模型的推荐系统（RecLLMs）的细微能力与固有偏见，重点关注基于ChatGPT的系统。研究深入分析了生成式模型与传统协同过滤模型在电影推荐中的行为差异。主要研究了提示设计策略及其对推荐质量多方面的影响，包括准确性、提供者公平性、多样性、稳定性、类型主导性及时效性（新颖性）。实验分析表明，在RecLLMs中引入特定的"系统角色"和"提示策略"显著影响其性能。例如，基于角色的提示增强了推荐的公平性和多样性，缓解了流行度偏差。我们发现，尽管基于GPT的模型并非始终能达到CF基线的性能，但它们表现出推荐更新颖、更多样化电影类型的独特倾向。值得注意的是，GPT模型倾向于推荐更近期的电影（尤其是2000年后发行的影片），并偏好《剧情片》、《喜剧片》和《爱情片》等类型（相较于CF模型偏爱的《动作片》、《冒险片》），这可能是由于RecLLMs在多样化数据集上的训练使其能比CF模型更有效地捕捉近期趋势和讨论。有趣的是，我们的结果表明，"简单提示"和"思维链（COT）"范式实现了最高的准确性。这些发现表明，将这些策略与偏好最新内容的场景相结合具有潜力，从而提供更平衡、更时新的推荐体验。本研究对理解新兴的RecLLMs做出了重要贡献，尤其是在这些系统中的有害性和偏见方面。