Personalized recommendation serves as a ubiquitous channel for users to discover information tailored to their interests. However, traditional recommendation models primarily rely on unique IDs and categorical features for user-item matching, potentially overlooking the nuanced essence of raw item contents across multiple modalities such as text, image, audio, and video. This underutilization of multimodal data poses a limitation to recommender systems, especially in multimedia services like news, music, and short-video platforms. The recent advancements in large multimodal models offer new opportunities and challenges in developing content-aware recommender systems. This survey seeks to provide a comprehensive exploration of the latest advancements and future trajectories in multimodal pretraining, adaptation, and generation techniques, as well as their applications in enhancing recommender systems. Furthermore, we discuss current open challenges and opportunities for future research in this dynamic domain. We believe that this survey, alongside the curated resources, will provide valuable insights to inspire further advancements in this evolving landscape.
翻译:个性化推荐作为用户发现符合其兴趣信息的普遍渠道,在信息过滤中发挥着关键作用。然而,传统推荐模型主要依赖唯一标识符和分类特征进行用户-物品匹配,可能忽略了跨文本、图像、音频和视频等多模态原始物品内容的细微本质。这种对多模态数据利用不足的问题限制了推荐系统的性能,尤其在新闻、音乐和短视频平台等多媒体服务中更为显著。近期大规模多模态模型的发展为开发内容感知的推荐系统带来了新的机遇与挑战。本综述旨在全面探讨多模态预训练、适应与生成技术的最新进展及未来发展方向,及其在增强推荐系统中的应用。此外,我们讨论了该动态领域当前面临的开放性挑战及未来研究的机遇。我们相信,本综述及整理的资源将为推动这一不断发展的领域提供有价值的见解,激发进一步的研究进展。