Maya: An Instruction Finetuned Multilingual Multimodal Model

Nahid Alam,Karthik Reddy Kanjula,Surya Guthikonda,Timothy Chung,Bala Krishna S Vegesna,Abhipsha Das,Anthony Susevski,Ryan Sze-Yin Chan,S M Iftekhar Uddin,Shayekh Bin Islam,Roshan Santhosh,Snegha A,Drishti Sharma,Chen Liu,Isha Chaturvedi,Genta Indra Winata,Ashvanth. S,Snehanshu Mukherjee,Alham Fikri Aji

The rapid development of large Vision-Language Models (VLMs) has led to impressive results on academic benchmarks, primarily in widely spoken languages. However, significant gaps remain in the ability of current VLMs to handle low-resource languages and varied cultural contexts, largely due to a lack of high-quality, diverse, and safety-vetted data. Consequently, these models often struggle to understand low-resource languages and cultural nuances in a manner free from toxicity. To address these limitations, we introduce Maya, an open-source Multimodal Multilingual model. Our contributions are threefold: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; 2) a thorough analysis of toxicity within the LLaVA dataset, followed by the creation of a novel toxicity-free version across eight languages; and 3) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available at https://github.com/nahidalam/maya.

翻译：大型视觉语言模型（VLMs）的快速发展已在学术基准测试中取得了令人瞩目的成果，主要集中在广泛使用的语言上。然而，当前VLMs在处理低资源语言和多样化文化背景方面的能力仍存在显著差距，这主要是由于缺乏高质量、多样化且经过安全审查的数据。因此，这些模型在理解低资源语言和文化细微差别时，往往难以避免有害内容。为了应对这些局限性，我们推出了Maya，一个开源的多模态多语言模型。我们的贡献包括三个方面：1）基于LLaVA预训练数据集，构建了一个涵盖八种语言的多语言图文预训练数据集；2）对LLaVA数据集中的有害内容进行了全面分析，并在此基础上创建了一个跨八种语言的无害新版本；3）开发了一个支持这些语言的多语言图文模型，提升了视觉语言任务中的文化和语言理解能力。代码可在 https://github.com/nahidalam/maya 获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日