Previous research has demonstrated the advantages of integrating data from multiple sources over traditional unimodal data, leading to the emergence of numerous novel multimodal applications. We propose a multimodal classification benchmark MuG with eight datasets that allows researchers to evaluate and improve their models. These datasets are collected from four various genres of games that cover tabular, textual, and visual modalities. We conduct multi-aspect data analysis to provide insights into the benchmark, including label balance ratios, percentages of missing features, distributions of data within each modality, and the correlations between labels and input modalities. We further present experimental results obtained by several state-of-the-art unimodal classifiers and multimodal classifiers, which demonstrate the challenging and multimodal-dependent properties of the benchmark. MuG is released at https://github.com/lujiaying/MUG-Bench with the data, tutorials, and implemented baselines.
翻译:已有研究证明,融合多源数据相比传统单模态数据具有优势,由此催生了众多新型多模态应用。我们提出了一个包含八个数据集的多模态分类基准MuG,供研究人员评估和改进其模型。这些数据集来自四种不同游戏类型,覆盖表格、文本和视觉三种模态。我们通过多维度数据分析揭示该基准的特性,包括标签平衡比例、特征缺失率、各模态数据分布情况,以及标签与输入模态之间的相关性。我们还展示了多种先进单模态分类器与多模态分类器的实验结果,这些结果表明该基准具有挑战性和多模态依赖性。MuG基准的数据、教程及实现基线已发布于https://github.com/lujiaying/MUG-Bench。