VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

Large Language Models (LLMs) have made significant progress in reasoning tasks across various domains such as mathematics and coding. However, their performance deteriorates in tasks requiring rich socio-cultural knowledge and diverse local contexts, particularly those involving Indian Culture. Existing Cultural benchmarks are (i) Manually crafted, (ii) contain single-hop questions testing factual recall, and (iii) prohibitively costly to scale, leaving this deficiency largely unmeasured. To address this, we introduce VIRAASAT, a novel, semi-automated multi-hop approach for generating cultural specific multi-hop Question-Answering dataset for Indian culture. VIRAASAT leverages a Knowledge Graph comprising more than 700 expert-curated cultural artifacts, covering 13 key attributes of Indian culture (history, festivals, etc). VIRAASAT spans all 28 states and 8 Union Territories, yielding more than 3,200 multi-hop questions that necessitate chained cultural reasoning. We evaluate current State-of-the-Art (SOTA) LLMs on VIRAASAT and identify key limitations in reasoning wherein fine-tuning on Chain-of-Thought(CoT) traces fails to ground and synthesize low-probability facts. To bridge this gap, we propose a novel framework named Symbolic Chain-of-Manipulation (SCoM). Adapting the Chain-of-Manipulation paradigm, we train the model to simulate atomic Knowledge Graph manipulations internally. SCoM teaches the model to reliably traverse the topological structure of the graph. Experiments on Supervised Fine-Tuning (SFT) demonstrate that SCoM outperforms standard CoT baselines by up to 20%. We release the VIRAASAT dataset along with our findings, laying a strong foundation towards building Culturally Aware Reasoning Models.

翻译：大型语言模型（LLM）在数学与编程等多个领域的推理任务中取得了显著进展。然而，在需要丰富社会文化知识和多样化本土语境的任务中，尤其是涉及印度文化的任务上，其性能会显著下降。现有的文化基准数据集存在以下问题：（一）依赖人工构建，（二）仅包含测试事实记忆的单跳问题，以及（三）扩展成本极高，导致这一缺陷在很大程度上未被有效衡量。为解决此问题，我们提出了VIRAASAT——一种新颖的半自动化多跳方法，用于生成针对印度文化的特定多跳问答数据集。VIRAASAT利用一个包含超过700个专家精心整理的文化实体的知识图谱，覆盖印度文化的13个关键属性（如历史、节日等）。该数据集涵盖印度全部28个邦和8个中央直辖区，生成了超过3,200个需要链式文化推理的多跳问题。我们在VIRAASAT上评估了当前最先进的LLM，并识别出其推理中的关键局限：基于思维链（CoT）轨迹的微调无法有效锚定和综合低概率事实。为弥补这一差距，我们提出了一种名为符号化操作链（SCoM）的新框架。通过适配操作链范式，我们训练模型在内部模拟原子化的知识图谱操作。SCoM教导模型可靠地遍历图谱的拓扑结构。在有监督微调（SFT）上的实验表明，SCoM相比标准CoT基线模型的性能提升最高可达20%。我们公开了VIRAASAT数据集及研究结果，为构建具备文化意识的推理模型奠定了坚实基础。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《潜在推理综述》

专知会员服务

21+阅读 · 2025年7月9日

大型语言模型推理增强外部知识：综述

专知会员服务

38+阅读 · 2025年6月2日

超越语言的推理：潜在思维链推理的综合综述

专知会员服务

22+阅读 · 2025年5月23日

通过逻辑推理赋能大语言模型：综述

专知会员服务

32+阅读 · 2025年2月24日