Sequential recommendation systems aim to provide personalized recommendations for users based on their interaction history. To achieve this, they often incorporate auxiliary information, such as textual descriptions of items and auxiliary tasks, like predicting user preferences and intent. Despite numerous efforts to enhance these models, they still suffer from limited personalization. To address this issue, we propose a new paradigm, which we term preference discerning. In preference dscerning, we explicitly condition a generative sequential recommendation system on user preferences within its context. To this end, we generate user preferences using Large Language Models (LLMs) based on user reviews and item-specific data. To evaluate preference discerning capabilities of sequential recommendation systems, we introduce a novel benchmark that provides a holistic evaluation across various scenarios, including preference steering and sentiment following. We assess current state-of-the-art methods using our benchmark and show that they struggle to accurately discern user preferences. Therefore, we propose a new method named Mender ($\textbf{M}$ultimodal Prefer$\textbf{en}$ce $\textbf{d}$iscern$\textbf{er}$), which improves upon existing methods and achieves state-of-the-art performance on our benchmark. Our results show that Mender can be effectively guided by human preferences even though they have not been observed during training, paving the way toward more personalized sequential recommendation systems. We will open-source the code and benchmarks upon publication.
翻译:顺序推荐系统旨在根据用户的交互历史提供个性化推荐。为实现这一目标,这些系统通常整合辅助信息(如物品的文本描述)和辅助任务(如预测用户偏好和意图)。尽管已有诸多努力来改进这些模型,但其个性化程度仍然有限。为解决这一问题,我们提出了一种新范式,称之为偏好识别。在偏好识别中,我们明确地将生成式顺序推荐系统的上下文条件设定为用户偏好。为此,我们基于用户评论和物品特定数据,利用大语言模型生成用户偏好。为评估顺序推荐系统的偏好识别能力,我们引入了一个新颖的基准测试,该基准在多种场景(包括偏好引导和情感跟随)下提供全面评估。我们使用该基准评估了当前最先进的方法,结果表明这些方法难以准确识别用户偏好。因此,我们提出了一种名为Mender($\textbf{M}$ultimodal Prefer$\textbf{en}$ce $\textbf{d}$iscern$\textbf{er}$)的新方法,该方法改进了现有技术,并在我们的基准测试中达到了最先进的性能。我们的结果表明,Mender能够有效地被人为偏好所引导,即使这些偏好在训练期间并未被观察到,这为开发更具个性化的顺序推荐系统铺平了道路。我们将在论文发表后开源代码和基准测试集。