In this survey, we systematically analyze techniques used to adapt large multimodal models (LMMs) for low-resource (LR) languages, examining approaches ranging from visual enhancement and data creation to cross-modal transfer and fusion strategies. Through a comprehensive analysis of 117 studies across 96 LR languages, we identify key patterns in how researchers tackle the challenges of limited data and computational resources. We categorize works into resource-oriented and method-oriented contributions, further dividing contributions into relevant sub-categories. We compare method-oriented contributions in terms of performance and efficiency, discussing benefits and limitations of representative studies. We find that visual information often serves as a crucial bridge for improving model performance in LR settings, though significant challenges remain in areas such as hallucination mitigation and computational efficiency. In summary, we provide researchers with a clear understanding of current approaches and remaining challenges in making LMMs more accessible to speakers of LR (understudied) languages. We complement our survey with an open-source repository available at: https://github.com/marianlupascu/LMM4LRL-Survey.
翻译:本综述系统性地分析了将大规模多模态模型适配于低资源语言所采用的技术,考察了从视觉增强、数据创建到跨模态迁移与融合策略等一系列方法。通过对涵盖96种低资源语言的117项研究进行全面分析,我们识别了研究人员应对数据与计算资源有限这一挑战的关键模式。我们将现有工作归类为资源导向型与方法导向型贡献,并进一步将贡献细分为相关子类别。我们从性能与效率角度比较了方法导向型贡献,并讨论了代表性研究的优势与局限。研究发现,在低资源场景中,视觉信息常作为提升模型性能的关键桥梁,但在幻觉缓解与计算效率等领域仍存在显著挑战。总而言之,本研究为研究人员提供了关于当前方法及在使大规模多模态模型更易于低资源(研究不足)语言使用者方面所面临挑战的清晰认识。我们同步开放了开源资源库作为本综述的补充,访问地址为:https://github.com/marianlupascu/LMM4LRL-Survey。