Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiZoo, a public toolkit consisting of standardized implementations of > 20 core multimodal algorithms and MultiBench, a large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. Together, these provide an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, we offer a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench paves the way towards a better understanding of the capabilities and limitations of multimodal models, while ensuring ease of use, accessibility, and reproducibility. Our toolkits are publicly available, will be regularly updated, and welcome inputs from the community.
翻译:学习多模态表示涉及整合来自多个异构数据源的信息。为加速对研究不足的模态和任务的探索,同时确保模型的现实鲁棒性,我们发布了MultiZoo——一个包含超过20种核心多模态算法标准化实现的公共工具包,以及MultiBench——一个覆盖15个数据集、10种模态、20个预测任务和6个研究领域的大规模基准。二者共同构成一个自动化的端到端机器学习流水线,简化并标准化了数据加载、实验设置和模型评估。为实现整体性评估,我们提供了一套综合方法论,用于衡量:(1)泛化能力,(2)时空复杂度,以及(3)模态鲁棒性。MultiBench为深入理解多模态模型的能力与局限奠定了基础,同时确保了易用性、可访问性和可复现性。我们的工具包已公开可用,并将定期更新,欢迎社区的反馈与贡献。