Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory

Access to reliable agricultural advisory remains limited in many developing regions due to a persistent language barrier: authoritative agricultural manuals are predominantly written in English, while farmers primarily communicate in low-resource local languages such as Bengali. Although recent advances in Large Language Models (LLMs) enable natural language interaction, direct generation in low-resource languages often exhibits poor fluency and factual inconsistency, while cloud-based solutions remain cost-prohibitive. This paper presents a cost-efficient, cross-lingual Retrieval-Augmented Generation (RAG) framework for Bengali agricultural advisory that emphasizes factual grounding and practical deployability. The proposed system adopts a translation-centric architecture in which Bengali user queries are translated into English, enriched through domain-specific keyword injection to align colloquial farmer terminology with scientific nomenclature, and answered via dense vector retrieval over a curated corpus of English agricultural manuals (FAO, IRRI). The generated English response is subsequently translated back into Bengali to ensure accessibility. The system is implemented entirely using open-source models and operates on consumer-grade hardware without reliance on paid APIs. Experimental evaluation demonstrates reliable source-grounded responses, robust rejection of out-of-domain queries, and an average end-to-end latency below 20 seconds. The results indicate that cross-lingual retrieval combined with controlled translation offers a practical and scalable solution for agricultural knowledge access in low-resource language settings

翻译：在许多发展中地区，获取可靠的农业咨询服务仍然受限，这主要源于持续存在的语言障碍：权威农业手册大多以英文撰写，而农民主要使用孟加拉语等低资源本地语言进行交流。尽管大型语言模型（LLMs）的最新进展实现了自然语言交互，但直接以低资源语言生成内容常存在流畅性不足和事实不一致的问题，而基于云端的解决方案成本依然过高。本文提出了一种面向孟加拉语农业咨询的成本高效跨语言检索增强生成（RAG）框架，强调事实依据与实际可部署性。该系统采用以翻译为核心的架构：将孟加拉语用户查询翻译为英文，通过注入领域特定关键词以对齐农民口语术语与科学命名体系，并通过对精选英文农业手册（如联合国粮农组织、国际水稻研究所资料）构建的密集向量检索库生成答案。生成的英文答复随后被译回孟加拉语以确保可访问性。该系统完全基于开源模型实现，可在消费级硬件上运行，无需依赖付费API。实验评估表明，系统能够提供可靠基于信源的答复，对领域外查询具有鲁棒的拒绝能力，且平均端到端延迟低于20秒。研究结果表明，跨语言检索与受控翻译相结合，为低资源语言环境下的农业知识获取提供了一种实用且可扩展的解决方案。