Large Language Models (LLMs) and Multimodal Large language models (MLLMs) have taken the world by storm with impressive abilities in complex reasoning and linguistic comprehension. Meanwhile there are plethora of works related to Vietnamese Large Language Models, the lack of high-quality resources in multimodality limits the progress of Vietnamese MLLMs. In this paper, we pioneer in address this by introducing LaVy, a state-of-the-art Vietnamese MLLM, and we also introduce LaVy-Bench benchmark designated for evaluating MLLMs's understanding on Vietnamese visual language tasks. All code and model weights are public at https://github.com/baochi0212/LaVy
翻译:大语言模型(LLMs)与多模态大语言模型(MLLMs)凭借其在复杂推理和语言理解方面的卓越能力席卷全球。尽管已有大量关于越南语大语言模型的研究工作,但多模态领域高质量资源的匮乏制约了越南语MLLMs的发展。本文率先通过引入LaVy(一种先进的越南语MLLM)以及专门评估MLLMs在越南语视觉语言任务中理解能力的LaVy-Bench基准来应对这一挑战。所有代码和模型权重均已开源至https://github.com/baochi0212/LaVy