Understanding Biases in ChatGPT-based Recommender Systems: Provider Fairness, Temporal Stability, and Recency

This study explores the nuanced capabilities and inherent biases of Recommender Systems using Large Language Models (RecLLMs), with a focus on ChatGPT-based systems. It studies into the contrasting behaviors of generative models and traditional collaborative filtering models in movie recommendations. The research primarily investigates prompt design strategies and their impact on various aspects of recommendation quality, including accuracy, provider fairness, diversity, stability, genre dominance, and temporal freshness (recency). Our experimental analysis reveals that the introduction of specific 'system roles' and 'prompt strategies' in RecLLMs significantly influences their performance. For instance, role-based prompts enhance fairness and diversity in recommendations, mitigating popularity bias. We find that while GPT-based models do not always match the performance of CF baselines, they exhibit a unique tendency to recommend newer and more diverse movie genres. Notably, GPT-based models tend to recommend more recent films, particularly those released post-2000, and show a preference for genres like \sq{Drama} and Comedy, and Romance (compared to CF Action, Adventure) presumably due to the RecLLMs' training on varied data sets, which allows them to capture recent trends and discussions more effectively than CF models. Interestingly, our results demonstrate that the 'Simple' and 'Chain of Thought (COT)' paradigms yield the highest accuracy. These findings imply the potential of combining these strategies with scenarios that favor more recent content, thereby offering a more balanced and up-to-date recommendation experience. This study contributes significantly to the understanding of emerging RecLLMs, particularly in the context of harms and biases within these systems.

翻译：本研究探讨了基于大语言模型的推荐系统（RecLLMs）的细微能力与固有偏差，重点关注基于ChatGPT的系统。我们深入研究了生成模型与传统协同过滤模型在电影推荐中的行为对比。研究主要考察提示设计策略及其对推荐质量多方面的影响，包括准确性、提供者公平性、多样性、稳定性、体裁主导性以及时间新鲜度（时效性）。实验分析表明，在RecLLMs中引入特定的"系统角色"和"提示策略"会显著影响其性能。例如，基于角色的提示增强了推荐的公平性和多样性，缓解了流行度偏差。我们发现，尽管基于GPT的模型在性能上并非总能媲美协同过滤基线，但它们展现出推荐更新颖、更多样化电影体裁的独特倾向。值得注意的是，GPT模型倾向于推荐更近期的影片（尤其是2000年后上映的），并对剧情片、喜剧、爱情片等体裁表现出偏好（相比协同过滤偏向动作片、冒险片），这可能是由于RecLLMs在多样化数据集上训练，使其能比协同过滤模型更有效地捕捉近期趋势和讨论。有趣的是，我们的结果表明"简单"和"思维链（COT）"范式达到了最高准确性。这些发现暗示了将这些策略与偏好近期内容的场景相结合的可能性，从而提供更均衡且更新的推荐体验。本研究为理解新兴RecLLMs（尤其是其中存在的危害与偏差）做出了重要贡献。