Sequential recommendation systems that model dynamic preferences based on a use's past behavior are crucial to e-commerce. Recent studies on these systems have considered various types of information such as images and texts. However, multimodal data have not yet been utilized directly to recommend products to users. In this study, we propose an attention-based sequential recommendation method that employs multimodal data of items such as images, texts, and categories. First, we extract image and text features from pre-trained VGG and BERT and convert categories into multi-labeled forms. Subsequently, attention operations are performed independent of the item sequence and multimodal representations. Finally, the individual attention information is integrated through an attention fusion function. In addition, we apply multitask learning loss for each modality to improve the generalization performance. The experimental results obtained from the Amazon datasets show that the proposed method outperforms those of conventional sequential recommendation systems.
翻译:基于用户历史行为建模动态偏好的序列推荐系统对电子商务至关重要。近期关于此类系统的研究已考虑了图像与文本等多种信息类型。然而,多模态数据尚未被直接用于向用户推荐商品。本研究提出一种基于注意力的序列推荐方法,该方法利用商品的多模态数据(如图像、文本与类别信息)。首先,我们通过预训练的VGG与BERT模型提取图像和文本特征,并将类别信息转换为多标签形式。随后,针对商品序列与多模态表征分别执行注意力运算。最终,通过注意力融合函数整合各独立注意力信息。此外,我们为每种模态应用多任务学习损失以提升泛化性能。基于亚马逊数据集的实验结果表明,所提方法优于传统序列推荐系统。