Large language models (LLMs) are at the forefront of transforming numerous domains globally. However, their inclusivity and effectiveness remain limited for non-Latin scripts and low-resource languages. This paper tackles the imperative challenge of enhancing the multilingual performance of LLMs without extensive training or fine-tuning. Through systematic investigation and evaluation of diverse languages using popular question-answering (QA) datasets, we present novel techniques that unlock the true potential of LLMs in a polyglot landscape. Our approach encompasses three key strategies that yield significant improvements in multilingual proficiency. First, by meticulously optimizing prompts tailored for polyglot LLMs, we unlock their latent capabilities, resulting in substantial performance boosts across languages. Second, we introduce a new hybrid approach that synergizes LLM Retrieval Augmented Generation (RAG) with multilingual embeddings and achieves improved multilingual task performance. Finally, we introduce a novel learning approach that dynamically selects the optimal prompt strategy, LLM model, and embedding model per query at run-time. This dynamic adaptation maximizes the efficacy of LLMs across languages, outperforming best static and random strategies. Additionally, our approach adapts configurations in both offline and online settings, and can seamlessly adapt to new languages and datasets, leading to substantial advancements in multilingual understanding and generation across diverse languages.
翻译:大语言模型(LLMs)正处于全球众多领域变革的前沿。然而,对于非拉丁文字和低资源语言,其包容性和有效性仍然有限。本文致力于应对在不进行大量训练或微调的情况下增强LLMs多语言性能这一迫切挑战。通过使用流行的问答(QA)数据集对不同语言进行系统性的调查与评估,我们提出了一系列新颖技术,以释放LLMs在多语言环境中的真正潜力。我们的方法包含三个关键策略,这些策略在多语言能力上带来了显著提升。首先,通过精心优化为多语言LLMs量身定制的提示,我们解锁了其潜在能力,从而在不同语言上实现了显著的性能提升。其次,我们引入了一种新的混合方法,该方法将LLM的检索增强生成(RAG)与多语言嵌入相结合,并实现了多语言任务性能的改进。最后,我们提出了一种新颖的学习方法,该方法在运行时动态地为每个查询选择最优的提示策略、LLM模型和嵌入模型。这种动态适应最大化了LLMs在不同语言上的效能,其表现优于最佳的静态和随机策略。此外,我们的方法支持在离线和在线设置中自适应配置,并能无缝适应新的语言和数据集,从而在多种不同语言的多语言理解与生成方面取得实质性进展。