With the growing interest in Multimodal Recommender Systems (MRSs), collecting high-quality datasets provided with multimedia side information (text, images, audio, video) has become a fundamental step. However, most of the current literature in the field relies on small- or medium-scale datasets that are either not publicly released or built using undocumented processes. In this paper, we aim to fill this gap by releasing M3L-10M and M3L-20M, two large-scale, reproducible, multimodal datasets for the movie domain, obtained by enriching with multimodal features the popular MovieLens-10M and MovieLens-20M, respectively. By following a fully documented pipeline, we collect movie plots, posters, and trailers, from which textual, visual, acoustic, and video features are extracted using several state-of-the-art encoders. We publicly release mappings to download the original raw data, the extracted features, and the complete datasets in multiple formats, fostering reproducibility and advancing the field of MRSs. In addition, we conduct qualitative and quantitative analyses that showcase our datasets across several perspectives. This work represents a foundational step to ensure reproducibility and replicability in the large-scale, multimodal movie recommendation domain. Our resource can be fully accessed at the following link: https://zenodo.org/records/18499145, while the source code is accessible at https://github.com/giuspillo/M3L_10M_20M.
翻译:随着多模态推荐系统(MRSs)日益受到关注,收集配备多媒体辅助信息(文本、图像、音频、视频)的高质量数据集已成为关键基础。然而,当前该领域的大多数研究依赖于中小规模数据集,这些数据集要么未公开发布,要么通过未公开流程构建。本文旨在填补这一空白,通过分别对广受欢迎的MovieLens-10M和MovieLens-20M进行多模态特征增强,发布了M3L-10M与M3L-20M这两个面向电影领域的大规模、可复现多模态数据集。我们遵循完全文档化的流程,收集了电影剧情简介、海报和预告片,并利用多种前沿编码器从中提取文本、视觉、声学和视频特征。我们公开发布了原始数据下载映射关系、提取的特征以及多种格式的完整数据集,以促进可复现性并推动MRSs领域发展。此外,我们通过定性与定量分析从多个维度展示了数据集的特性。本工作为确保大规模多模态电影推荐领域的可复现性与可验证性奠定了基石。完整资源可通过以下链接获取:https://zenodo.org/records/18499145,源代码访问地址为:https://github.com/giuspillo/M3L_10M_20M。