Legal reasoning is not semantic similarity search. A court judgment encodes constrained symbolic reasoning: precedent propagation, procedural state transitions, and statute-bound inference. These are properties that vector-based retrieval-augmented generation (RAG) cannot faithfully represent. Hallucinated precedents, outdated statute citations, and unsupported reasoning chains remain persistent failure modes in LLM-based legal AI, with real consequences for access to justice in high-caseload jurisdictions such as India. This paper presents Falkor-IRAC, a graph-constrained generation framework for Indian legal AI that grounds generation in structured reasoning over an IRAC (Issue, Rule, Analysis, Conclusion) knowledge graph. Judgments from the Supreme Court and High Courts of India are ingested as IRAC node structures enriched with procedural state transitions, precedent relationships, and statutory references, stored in FalkorDB for low-latency agentic traversal. At inference time, LLM-generated answers are accepted only if a valid supporting path can be traced through the graph, a check performed by a falsifiability oracle called the Verifier Agent. The system also detects doctrinal conflicts as a first-class output rather than silently resolving them. Falkor-IRAC is evaluated using graph-native metrics: citation grounding accuracy, path validity rate, hallucinated precedent rate, and conflict detection rate. These metrics are argued to be more appropriate for legal reasoning evaluation than BLEU and ROUGE. On a proof-of-concept corpus of 51 Supreme Court judgments, the Verifier Agent correctly validated citations on completed queries and correctly rejected fabricated citations. Evaluation against vector-only RAG baselines is left for future work. The companion InIRAC dataset, 500+ structured Indian court judgments with IRAC annotations, is released alongside this paper.
翻译:法律推理并非语义相似性搜索。法院判决编码了受约束的符号推理:先例传播、程序状态转换以及受法条约束的推理。这些是基于向量的检索增强生成(RAG)无法忠实表示的特性。在基于LLM的法律AI中,幻觉先例、过时法条引用以及无依据的推理链仍然是持续存在的故障模式,对印度等高案件量司法辖区的司法可及性产生了实际影响。本文提出了Falkor-IRAC,一个面向印度法律AI的图约束生成框架,该框架将生成过程锚定在基于IRAC(Issue,Rule,Analysis,Conclusion)知识图谱的结构化推理之上。印度最高法院及高等法院的判决被提取为IRAC节点结构,并附有程序状态转换、先例关系和法条引用,存储在FalkorDB中,以实现低延迟的智能体遍历。在推理时,仅当能在图中找到一条有效的支持路径时,LLM生成的答案才被接受,该检查由一个称为验证智能体的可证伪性预言机执行。该系统还将学说冲突作为一种直接输出进行检测,而非默示地消解冲突。Falkor-IRAC使用图原生指标进行评估:引用接地精度、路径有效率、幻觉先例率和冲突检测率。本文论证这些指标比BLEU和ROUGE更适合于法律推理评估。在一个包含51份最高法院判决的概念验证语料库上,验证智能体正确验证了已完成查询的引用,并正确拒绝了捏造的引用。与纯向量RAG基线的评估留待未来工作。作为本文附件的 InIRAC 数据集(包含500余份带有IRAC标注的结构化印度法院判决)与该论文一同发布。