AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundational models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite of functionalities spanning classification, regression, object detection, semantic matching, and image segmentation. Experiments across diverse datasets and tasks showcases AutoMM's superior performance in basic classification and regression tasks compared to existing AutoML tools, while also demonstrating competitive results in advanced tasks, aligning with specialized toolboxes designed for such purposes.
翻译:AutoGluon-Multimodal(AutoMM)被引入为一个专为多模态学习设计的开源AutoML库。以其极高的易用性著称,AutoMM仅需三行代码即可实现对基础模型的微调。该库支持多种模态,包括图像、文本和表格数据,既可独立使用也可组合运用,提供涵盖分类、回归、目标检测、语义匹配和图像分割的全面功能套件。跨不同数据集和任务的实验表明,与现有AutoML工具相比,AutoMM在基础分类和回归任务上展现出卓越性能,同时在高级任务中也取得了具有竞争力的结果,与为此类目的设计的专用工具箱性能相当。