Large-scale Benchmarks for Multimodal Recommendation with Ducho

The common multimodal recommendation pipeline involves (i) extracting multimodal features, (ii) refining their high-level representations to suit the recommendation task, (iii) optionally fusing all multimodal features, and (iv) predicting the user-item score. Although great effort has been put into designing optimal solutions for (ii-iv), to the best of our knowledge, very little attention has been devoted to exploring procedures for (i) in a rigorous way. In this respect, the existing literature outlines the large availability of multimodal datasets and the ever-growing number of large models accounting for multimodal-aware tasks, but (at the same time) an unjustified adoption of limited standardized solutions. As very recent works from the literature have begun to conduct empirical studies to assess the contribution of multimodality in recommendation, we decide to follow and complement this same research direction. To this end, this paper settles as the first attempt to offer a large-scale benchmarking for multimodal recommender systems, with a specific focus on multimodal extractors. Specifically, we take advantage of three popular and recent frameworks for multimodal feature extraction and reproducibility in recommendation, Ducho, and MMRec/Elliot, respectively, to offer a unified and ready-to-use experimental environment able to run extensive benchmarking analyses leveraging novel multimodal feature extractors. Results, largely validated under different extractors, hyper-parameters of the extractors, domains, and modalities, provide important insights on how to train and tune the next generation of multimodal recommendation algorithms.

翻译：常见的多模态推荐流程通常包含以下步骤：(i) 提取多模态特征，(ii) 精炼高层表征以适应推荐任务，(iii) 可选地融合所有多模态特征，以及(iv) 预测用户-物品评分。尽管学界已投入大量精力设计(ii-iv)步骤的最优解决方案，但据我们所知，目前鲜有研究以严谨方式深入探索步骤(i)的实施方法。现有文献既指出了多模态数据集的高度可获得性，也记录了面向多模态感知任务的大型模型数量的持续增长，但同时也揭示了当前领域不合理地采用了有限的标准化解决方案。鉴于近期文献已开始通过实证研究评估多模态在推荐系统中的贡献，我们决定延续并拓展这一研究方向。为此，本文首次尝试为多模态推荐系统建立大规模基准测试框架，并特别聚焦于多模态特征提取器。具体而言，我们整合了三个近期流行的多模态特征提取与推荐可复现框架——Ducho、MMRec和Elliot，构建了一个统一且即用的实验环境，能够利用新型多模态特征提取器开展广泛的基准测试分析。通过在不同特征提取器、提取器超参数、应用领域及模态组合下充分验证的结果，为训练和调优下一代多模态推荐算法提供了重要洞见。