Large Language Models (LLMs) and Multimodal Large language models (MLLMs) have taken the world by storm with impressive abilities in complex reasoning and linguistic comprehension. Meanwhile there are plethora of works related to Vietnamese Large Language Models, the lack of high-quality resources in multimodality limits the progress of Vietnamese MLLMs. In this paper, we pioneer in address this by introducing LaVy, a state-of-the-art Vietnamese MLLM, and we also introduce LaVy-Bench benchmark designated for evaluating MLLMs's understanding on Vietnamese visual language tasks. All code and model weights are public at https://github.com/baochi0212/LaVy
翻译:大语言模型(LLMs)与多模态大语言模型(MLLMs)凭借其在复杂推理与语言理解方面的卓越能力,在全球范围内引发了巨大冲击。尽管已有大量涉及越南语大语言模型的研究工作,但多模态领域高质量资源的匮乏制约了越南语MLLMs的发展。本文率先通过引入LaVy——一种最先进的越南语多模态大语言模型——来解决这一问题,同时我们还推出了LaVy-Bench基准测试,专用于评估MLLMs对越南语视觉语言任务的理解能力。所有代码与模型权重均已在https://github.com/baochi0212/LaVy 公开。