Large Language Models are applied to recommendation tasks such as items to buy and news articles to read. Point of Interest is quite a new area to sequential recommendation based on language representations of multimodal datasets. As a first step to prove our concepts, we focused on restaurant recommendation based on each user's past visit history. When choosing a next restaurant to visit, a user would consider genre and location of the venue and, if available, pictures of dishes served there. We created a pseudo restaurant check-in history dataset from the Foursquare dataset and the FoodX-251 dataset by converting pictures into text descriptions with a multimodal model called LLaVA, and used a language-based sequential recommendation framework named Recformer proposed in 2023. A model trained on this semi-multimodal dataset has outperformed another model trained on the same dataset without picture descriptions. This suggests that this semi-multimodal model reflects actual human behaviours and that our path to a multimodal recommendation model is in the right direction.
翻译:大型语言模型已应用于物品购买、新闻阅读等推荐任务。基于多模态数据集语言表征的序列化兴趣点推荐是一个较新的研究领域。为初步验证概念,我们聚焦于基于用户历史访问记录的餐厅推荐。用户在选择下一家餐厅时,通常会考虑餐厅类型、地理位置,以及可获取的菜品图片。我们通过多模态模型LLaVA将图片转换为文本描述,结合Foursquare数据集和FoodX-251数据集构建了模拟餐厅签到历史数据集,并采用2023年提出的基于语言的序列推荐框架Recformer进行训练。实验表明,使用这种半多模态数据集训练的模型性能优于未使用图片描述的对照模型。这证明该半多模态模型能更准确地反映真实用户行为,标志着我们向多模态推荐模型的探索方向具有正确性。