Multimodal learning has attracted the interest of the machine learning community due to its great potential in a variety of applications. To help achieve this potential, we propose a multimodal benchmark MuG with eight datasets allowing researchers to test the multimodal perceptron capabilities of their models. These datasets are collected from four different genres of games that cover tabular, textual, and visual modalities. We conduct multi-aspect data analysis to provide insights into the benchmark, including label balance ratios, percentages of missing features, distributions of data within each modality, and the correlations between labels and input modalities. We further present experimental results obtained by several state-of-the-art unimodal classifiers and multimodal classifiers, which demonstrate the challenging and multimodal-dependent properties of the benchmark. MuG is released at https://github.com/lujiaying/MUG-Bench with the data, documents, tutorials, and implemented baselines. Extensions of MuG are welcomed to facilitate the progress of research in multimodal learning problems.
翻译:多模态学习因其在各类应用中的巨大潜力,引起了机器学习领域的广泛关注。为助力实现这一潜力,我们提出多模态基准MuG,该基准包含八个数据集,供研究者测试其模型的多模态感知能力。这些数据集采集自四种不同游戏类型,涵盖表格、文本和视觉三种模态。我们通过多维度数据分析揭示该基准的特征,包括标签平衡比、缺失特征百分比、各模态内数据分布以及标签与输入模态间的相关性。我们进一步呈现了多种最先进单模态分类器与多模态分类器的实验结果,验证了该基准的挑战性及其对多模态的依赖性。MuG已在https://github.com/lujiaying/MUG-Bench开源,提供数据、文档、教程及已实现的基线模型。欢迎扩展MuG以推动多模态学习问题的研究进展。