Modeling customer shopping intentions is a crucial task for e-commerce, as it directly impacts user experience and engagement. Thus, accurately understanding customer preferences is essential for providing personalized recommendations. Session-based recommendation, which utilizes customer session data to predict their next interaction, has become increasingly popular. However, existing session datasets have limitations in terms of item attributes, user diversity, and dataset scale. As a result, they cannot comprehensively capture the spectrum of user behaviors and preferences. To bridge this gap, we present the Amazon Multilingual Multi-locale Shopping Session Dataset, namely Amazon-M2. It is the first multilingual dataset consisting of millions of user sessions from six different locales, where the major languages of products are English, German, Japanese, French, Italian, and Spanish. Remarkably, the dataset can help us enhance personalization and understanding of user preferences, which can benefit various existing tasks as well as enable new tasks. To test the potential of the dataset, we introduce three tasks in this work: (1) next-product recommendation, (2) next-product recommendation with domain shifts, and (3) next-product title generation. With the above tasks, we benchmark a range of algorithms on our proposed dataset, drawing new insights for further research and practice. In addition, based on the proposed dataset and tasks, we hosted a competition in the KDD CUP 2023 and have attracted thousands of users and submissions. The winning solutions and the associated workshop can be accessed at our website https://kddcup23.github.io/.
翻译:建模顾客购物意图是电子商务的关键任务,直接影响用户体验与用户参与度。因此,准确理解顾客偏好对于提供个性化推荐至关重要。基于会话的推荐方法利用顾客会话数据预测其下一次交互,已日益普及。然而,现有会话数据集在商品属性、用户多样性和数据集规模方面存在局限性,无法全面捕捉用户行为与偏好的全貌。为弥补这一不足,我们提出亚马逊多语言多地区购物会话数据集(Amazon-M2)。这是首个包含来自六个不同地区的数百万用户会话的多语言数据集,其中商品的主要语言为英语、德语、日语、法语、意大利语和西班牙语。值得注意的是,该数据集有助于增强个性化推荐与用户偏好理解,既可惠及现有多种任务,也能支持新任务开发。为检验该数据集的潜力,本文引入三项任务:(1)下一商品推荐,(2)跨领域下一商品推荐,(3)下一商品标题生成。基于上述任务,我们在所提数据集上对多种算法进行基准测试,为后续研究与实践提供新见解。此外,基于该数据集与任务,我们在KDD CUP 2023中举办竞赛,吸引数千名用户参与提交。优胜解决方案及相关研讨会信息可访问我们网站https://kddcup23.github.io/。