Diversity is an important evaluation criterion for recommender systems beyond accuracy, yet users differ in their willingness to engage with novel and diverse content. In this work, we investigate how a Large Language Model (LLM)-based multi-agent system supports users' exploration of diverse recommendations, and how individual characteristics shape user experiences. We conducted a between-subjects user study (N = 100) comparing a single-agent system (baseline) with a multi-agent system for movie recommendations. We measured Perceived Accuracy, diversity, novelty, and overall rating, and examined the influence of personal characteristics, including personality traits, demographics, GenAI recommendation experience, and GenAI skepticism. Results show that the multi-agent system significantly increases Perceived Novelty and Shannon Diversity. Conscientiousness is positively associated with Perceived Accuracy and diversity, whereas extraversion is negatively associated with Perceived Diversity. Prior experience with GenAI-based recommendations is positively associated with Shannon Diversity, while skepticism toward GenAI is negatively associated with it. We also observe significant interaction effects between system design and user characteristics. These findings highlight the importance of personality-aware conversational recommender systems and caution against one-size-fits-all multi-agent designs.
翻译:多样性是推荐系统除准确性之外的重要评价标准,但用户在接触新颖和多样化内容的意愿上存在差异。本研究探讨了基于大语言模型的多智能体系统如何支持用户探索多样化推荐,以及个体特征如何塑造用户体验。我们开展了一项受试者间用户研究(N=100),比较了单智能体系统(基线)与多智能体系统在电影推荐中的表现。实验测量了感知准确性、多样性、新颖性和整体评分,并考察了人格特质、人口统计学特征、生成式AI推荐经验及对生成式AI怀疑态度等个人特征的影响。结果表明,多智能体系统显著提升了感知新颖性和香农多样性。尽责性与感知准确性和多样性呈正相关,而外向性与感知多样性呈负相关。先前的生成式AI推荐经验与香农多样性呈正相关,而对生成式AI的怀疑态度则与之呈负相关。我们还观察到系统设计与用户特征之间存在显著的交互效应。这些发现凸显了人格感知型对话推荐系统的重要性,并警示应避免采用"一刀切"式的多智能体设计。