The advent of Large Language Models (LLMs) has revolutionized Natural Language Processing, yet their application in high-stakes, specialized domains like religious question answering is hindered by challenges like hallucination and unfaithfulness to authoritative sources. This issue is particularly critical for the Persian-speaking Muslim community, where accuracy and trustworthiness are paramount. Existing Retrieval-Augmented Generation (RAG) systems, relying on simplistic single-pass pipelines, fall short on complex, multi-hop queries requiring multi-step reasoning and evidence aggregation. To address this gap, we introduce FARSIQA, a novel, end-to-end system for Faithful Advanced Question Answering in the Persian Islamic domain. FARSIQA is built upon our innovative FAIR-RAG architecture: a Faithful, Adaptive, Iterative Refinement framework for RAG. FAIR-RAG employs a dynamic, self-correcting process: it adaptively decomposes complex queries, assesses evidence sufficiency, and enters an iterative loop to generate sub-queries, progressively filling information gaps. Operating on a curated knowledge base of over one million authoritative Islamic documents, FARSIQA demonstrates superior performance. Rigorous evaluation on the challenging IslamicPCQA benchmark shows state-of-the-art performance: the system achieves a remarkable 97.0% in Negative Rejection - a 40-point improvement over baselines - and a high Answer Correctness score of 74.3%. Our work establishes a new standard for Persian Islamic QA and validates that our iterative, adaptive architecture is crucial for building faithful, reliable AI systems in sensitive domains.
翻译:大型语言模型(LLMs)的出现彻底改变了自然语言处理领域,然而,在宗教问答等高风险、专业化领域中应用这些模型仍面临幻觉问题以及对权威来源不忠实等挑战。对于波斯语穆斯林社群而言,准确性和可信度至关重要,这一问题尤为突出。现有的检索增强生成(RAG)系统依赖于简单的单次处理流程,难以应对需要多步推理和证据整合的复杂、多跳查询。为填补这一空白,我们提出了FARSIQA,一个面向波斯伊斯兰教领域、端到端的忠实先进问答系统。FARSIQA基于我们创新的FAIR-RAG架构构建:一种面向RAG的忠实、自适应、迭代优化框架。FAIR-RAG采用动态自校正流程:自适应地分解复杂查询,评估证据充分性,并进入迭代循环以生成子查询,逐步填补信息缺口。基于一个包含超过一百万份权威伊斯兰教文献的精选知识库,FARSIQA展现出卓越性能。在具有挑战性的IslamicPCQA基准测试中的严格评估表明,该系统取得了最先进的性能:在负面拒绝指标上达到97.0%的显著成绩——较基线提升40个百分点——并在答案正确性指标上获得74.3%的高分。我们的工作为波斯伊斯兰教问答设立了新标准,并验证了迭代、自适应的架构对于在敏感领域构建忠实、可靠的人工智能系统至关重要。