This article presents a novel approach to multimodal recommendation systems, focusing on integrating and purifying multimodal data. Our methodology starts by developing a filter to remove noise from various types of data, making the recommendations more reliable. We studied the impact of top-K sparsification on different datasets, finding optimal values that strike a balance between underfitting and overfitting concerns. The study emphasizes the significant role of textual information compared to visual data in providing a deep understanding of items. We conducted sensitivity analyses to understand how different modalities and the use of purifier circle loss affect the efficiency of the model. The findings indicate that systems that incorporate multiple modalities perform better than those relying on just one modality. Our approach highlights the importance of modality purifiers in filtering out irrelevant data, ensuring that user preferences remain relevant. Models without modality purifiers showed reduced performance, emphasizing the need for effective integration of pre-extracted features. The proposed model, which includes an novel self supervised auxiliary task, shows promise in accurately capturing user preferences. The main goal of the fusion technique is to enhance the modeling of user preferences by combining knowledge with item information, utilizing sophisticated language models. Extensive experiments show that our model produces better results than the existing state-of-the-art multimodal recommendation systems.
翻译:本文提出了一种新颖的多模态推荐系统方法,重点在于集成与净化多模态数据。我们的方法首先开发了一种过滤器,用于去除各类数据中的噪声,从而提高推荐的可靠性。我们研究了top-K稀疏化在不同数据集上的影响,找到了在欠拟合与过拟合问题之间取得平衡的最优值。研究强调了相较于视觉数据,文本信息在深入理解物品方面的重要作用。我们进行了敏感性分析,以理解不同模态及净化器循环损失的使用如何影响模型的效率。研究结果表明,融合多种模态的系统性能优于仅依赖单一模态的系统。我们的方法凸显了模态净化器在过滤无关数据、确保用户偏好相关性方面的重要性。未配备模态净化器的模型表现出性能下降,这强调了有效集成预提取特征的必要性。所提出的模型包含一项新颖的自监督辅助任务,在准确捕捉用户偏好方面展现出潜力。该融合技术的主要目标是通过结合知识图谱与物品信息,并利用先进的语言模型,来增强用户偏好的建模。大量实验表明,我们的模型相较于现有最先进的多模态推荐系统取得了更优的结果。