Recommendation systems have become popular and effective tools to help users discover their interesting items by modeling the user preference and item property based on implicit interactions (e.g., purchasing and clicking). Humans perceive the world by processing the modality signals (e.g., audio, text and image), which inspired researchers to build a recommender system that can understand and interpret data from different modalities. Those models could capture the hidden relations between different modalities and possibly recover the complementary information which can not be captured by a uni-modal approach and implicit interactions. The goal of this survey is to provide a comprehensive review of the recent research efforts on the multimodal recommendation. Specifically, it shows a clear pipeline with commonly used techniques in each step and classifies the models by the methods used. Additionally, a code framework has been designed that helps researchers new in this area to understand the principles and techniques, and easily runs the SOTA models. Our framework is located at: https://github.com/enoche/MMRec
翻译:推荐系统已成为帮助用户发现感兴趣物品的流行且有效工具,其通过建模基于隐式交互(如购买和点击)的用户偏好和物品属性来实现。人类通过处理多模态信号(例如音频、文本和图像)感知世界,这启发研究者构建能够理解和解释不同模态数据的推荐系统。这类模型可捕获不同模态间的隐藏关联,并可能恢复单模态方法和隐式交互所无法捕捉的互补信息。本综述旨在系统梳理近年来多模态推荐领域的研究进展,具体展示了各环节常用技术的清晰流程,并按方法对模型进行分类。此外,我们设计了一个代码框架,帮助该领域的新研究者理解原理与技术,并便捷地运行当前最优模型。该框架托管于:https://github.com/enoche/MMRec