Large Language Models (LLMs) have made significant progress in reasoning tasks across various domains such as mathematics and coding. However, their performance deteriorates in tasks requiring rich socio-cultural knowledge and diverse local contexts, particularly those involving Indian Culture. Existing Cultural benchmarks are (i) Manually crafted, (ii) contain single-hop questions testing factual recall, and (iii) prohibitively costly to scale, leaving this deficiency largely unmeasured. To address this, we introduce VIRAASAT, a novel, semi-automated multi-hop approach for generating cultural specific multi-hop Question-Answering dataset for Indian culture. VIRAASAT leverages a Knowledge Graph comprising more than 700 expert-curated cultural artifacts, covering 13 key attributes of Indian culture (history, festivals, etc). VIRAASAT spans all 28 states and 8 Union Territories, yielding more than 3,200 multi-hop questions that necessitate chained cultural reasoning. We evaluate current State-of-the-Art (SOTA) LLMs on VIRAASAT and identify key limitations in reasoning wherein fine-tuning on Chain-of-Thought(CoT) traces fails to ground and synthesize low-probability facts. To bridge this gap, we propose a novel framework named Symbolic Chain-of-Manipulation (SCoM). Adapting the Chain-of-Manipulation paradigm, we train the model to simulate atomic Knowledge Graph manipulations internally. SCoM teaches the model to reliably traverse the topological structure of the graph. Experiments on Supervised Fine-Tuning (SFT) demonstrate that SCoM outperforms standard CoT baselines by up to 20%. We release the VIRAASAT dataset along with our findings, laying a strong foundation towards building Culturally Aware Reasoning Models.
翻译:大型语言模型(LLM)在数学与编程等多个领域的推理任务中取得了显著进展。然而,在需要丰富社会文化知识和多样化本土语境的任务中,尤其是涉及印度文化的任务上,其性能会显著下降。现有的文化基准数据集存在以下问题:(一)依赖人工构建,(二)仅包含测试事实记忆的单跳问题,以及(三)扩展成本极高,导致这一缺陷在很大程度上未被有效衡量。为解决此问题,我们提出了VIRAASAT——一种新颖的半自动化多跳方法,用于生成针对印度文化的特定多跳问答数据集。VIRAASAT利用一个包含超过700个专家精心整理的文化实体的知识图谱,覆盖印度文化的13个关键属性(如历史、节日等)。该数据集涵盖印度全部28个邦和8个中央直辖区,生成了超过3,200个需要链式文化推理的多跳问题。我们在VIRAASAT上评估了当前最先进的LLM,并识别出其推理中的关键局限:基于思维链(CoT)轨迹的微调无法有效锚定和综合低概率事实。为弥补这一差距,我们提出了一种名为符号化操作链(SCoM)的新框架。通过适配操作链范式,我们训练模型在内部模拟原子化的知识图谱操作。SCoM教导模型可靠地遍历图谱的拓扑结构。在有监督微调(SFT)上的实验表明,SCoM相比标准CoT基线模型的性能提升最高可达20%。我们公开了VIRAASAT数据集及研究结果,为构建具备文化意识的推理模型奠定了坚实基础。