Legal reasoning is not semantic similarity search. A court judgment encodes constrained symbolic reasoning: precedent propagation, procedural state transitions, and statute-bound inference. These are properties that vector-based retrieval-augmented generation (RAG) cannot faithfully represent. Hallucinated precedents, outdated statute citations, and unsupported reasoning chains remain persistent failure modes in LLM-based legal AI, with real consequences for access to justice in high-caseload jurisdictions such as India. This paper presents Falkor-IRAC, a graph-constrained generation framework for Indian legal AI that grounds generation in structured reasoning over an IRAC (Issue, Rule, Analysis, Conclusion) knowledge graph. Judgments from the Supreme Court and High Courts of India are ingested as IRAC node structures enriched with procedural state transitions, precedent relationships, and statutory references, stored in FalkorDB for low-latency agentic traversal. At inference time, LLM-generated answers are accepted only if a valid supporting path can be traced through the graph, a check performed by a falsifiability oracle called the Verifier Agent. The system also detects doctrinal conflicts as a first-class output rather than silently resolving them. Falkor-IRAC is evaluated using graph-native metrics: citation grounding accuracy, path validity rate, hallucinated precedent rate, and conflict detection rate. These metrics are argued to be more appropriate for legal reasoning evaluation than BLEU and ROUGE. On a proof-of-concept corpus of 51 Supreme Court judgments, the Verifier Agent correctly validated citations on completed queries and correctly rejected fabricated citations. Evaluation against vector-only RAG baselines is left for future work, as is GPU-accelerated inference to address current timeout rates on CPU hardware.
翻译:法律推理并非语义相似性搜索。法院判决编码了受限符号推理:判例传播、程序状态转移以及法规约束下的推断。这些特性是向量检索增强生成(RAG)无法忠实表征的。在基于大语言模型的法律AI中,虚构判例、过时法规引用以及无依据的推理链仍然是持续存在的失效模式,这对印度等高案件量司法管辖区的司法可及性产生了实际影响。本文提出Falkor-IRAC——一个面向印度法律AI的图约束生成框架,通过IRAC(Issue、Rule、Analysis、Conclusion)知识图谱上的结构化推理来约束文本生成。来自印度最高法院及高等法院的判决被转化为IRAC节点结构,并附有程序状态转移、判例关系及成文法引用,存储于FalkorDB中以实现低延迟的智能体遍历。在推理阶段,只有当大语言模型生成的答案可通过图追溯有效支持路径时,结果才被接受——该检测由名为验证器代理的可证伪性预言机执行。系统还将教义冲突作为一等输出而非静默消解。Falkor-IRAC采用图原生指标进行评估:引用准确率、路径有效比率、虚构判例率及冲突检测率。本文论证这些指标比BLEU和ROUGE更适合法律推理评估。在包含51份最高法院判决的概念验证语料上,验证器代理正确验证了已完成查询中的引用,并正确拒绝了虚构引用。与仅使用向量的RAG基线的对比评估留待未来工作,同时GPU加速推理也留待解决当前CPU硬件上的超时率问题。