Large Language Models are applied to recommendation tasks such as items to buy and news articles to read. Point of Interest is quite a new area to sequential recommendation based on language representations of multimodal datasets. As a first step to prove our concepts, we focused on restaurant recommendation based on each user's past visit history. When choosing a next restaurant to visit, a user would consider genre and location of the venue and, if available, pictures of dishes served there. We created a pseudo restaurant check-in history dataset from the Foursquare dataset and the FoodX-251 dataset by converting pictures into text descriptions with a multimodal model called LLaVA, and used a language-based sequential recommendation framework named Recformer proposed in 2023. A model trained on this semi-multimodal dataset has outperformed another model trained on the same dataset without picture descriptions. This suggests that this semi-multimodal model reflects actual human behaviours and that our path to a multimodal recommendation model is in the right direction.
翻译:大型语言模型已应用于推荐任务,如商品购买与新闻阅读推荐。兴趣点是基于多模态数据集语言表示的序列推荐领域中较新的研究方向。为初步验证概念,我们聚焦于基于用户历史访问记录的餐厅推荐。用户在选择下一家餐厅时,通常会考虑餐厅类型、地理位置,以及可获取的菜品图片。我们通过多模态模型LLaVA将图片转换为文本描述,结合Foursquare数据集与FoodX-251数据集构建了伪餐厅签到历史数据集,并采用2023年提出的基于语言的序列推荐框架Recformer进行训练。实验表明,在此半多模态数据集上训练的模型性能优于未使用图片描述的基准模型。这证明该半多模态模型能更准确地反映真实用户行为,表明我们构建多模态推荐模型的研究方向具有可行性。