The impressive development of large language models (LLMs) is expanding into the realm of large multimodal models (LMMs), which incorporate multiple types of data beyond text. However, the nature of multimodal models leads to significant expenses in the creation of training data. Furthermore, constructing multilingual data for LMMs presents its own set of challenges due to language diversity and complexity. Therefore, in this study, we propose two cost-effective methods to solve this problem: (1) vocabulary expansion and pretraining of multilingual LLM for specific languages, and (2) automatic and elaborate construction of multimodal datasets using GPT4-V. Based on015 these methods, we constructed a 91K English-Korean-Chinese multilingual, multimodal training dataset. Additionally, we developed a bilingual multimodal model that exhibits excellent performance in both Korean and English, surpassing existing approaches.
翻译:大型语言模型(LLMs)的显著发展正扩展至大型多模态模型(LMMs)领域,这类模型整合了文本以外的多种数据类型。然而,多模态模型的特性导致训练数据的构建成本极高。此外,由于语言的多样性和复杂性,为LMMs构建多语言数据也面临诸多挑战。因此,本研究提出两种低成本方法来解决该问题:(1)针对特定语言的多语言LLM词汇扩展与预训练;(2)利用GPT4-V自动且精细地构建多模态数据集。基于这些方法,我们构建了一个包含9.1万条英语-韩语-汉语的多语言多模态训练数据集。同时,我们开发了一个在韩语和英语中均表现出优异性能的双语多模态模型,超越了现有方法。