The inception of Large Language Models (LLMs) has catalyzed AI adoption in the finance sector, yet their reliability in complex, jurisdiction-specific tasks like Indian Chartered Accountancy (CA) remains limited. The models display difficulty in executing numerical tasks which require multiple steps while also needing advanced knowledge about legal regulations and the method of scaling their operations is not feasible in settings which have limited access to resources. We present CA-ThinkFlow as a parameter-efficient Retrieval-Augmented Generation (RAG) framework which operates with a 14B, 4-bit-quantized reasoning model, 14B-DeepSeek-R1, and a layout-aware Docling extraction system which maintains document structure during extraction. CA-ThinkFlow uses a basic RAG method which automatically adds retrieved information into the prompt, while it depends on the model's built-in Chain-of-Thought (CoT) functions to create context and produce correct answers. The system we developed system operates at performance levels which match large proprietary models when we tested it on the multi-level CA-Ben benchmark, achieving Scholastic Reliability Coefficient (SRC) results which equal 68.75\% of GPT-4o and Claude 3.5 Sonnet. The framework shows high efficiency and strength in handling parameters, but essential reasoning abilities fail to process complex regulatory texts which exist in fields such as Taxation.
翻译:大型语言模型(LLMs)的出现推动了人工智能在金融领域的应用,然而它们在处理印度特许会计师(CA)等复杂、司法管辖区特定任务时的可靠性仍然有限。模型在执行需要多步骤运算且同时需要掌握法律法规深层知识及规模化操作方法的数值任务时存在困难,且其扩展操作在资源受限环境中难以实现。我们提出CA-ThinkFlow,一种参数高效的检索增强生成(RAG)框架,该框架采用140亿参数、4位量化的推理模型14B-DeepSeek-R1以及布局感知的Docling提取系统,在提取过程中保持文档结构。CA-ThinkFlow采用基础RAG方法,自动将检索信息融入提示词,并依赖模型内置的思维链(CoT)功能构建上下文并生成正确答案。我们在多层级CA-Ben基准测试上进行评估,发现该系统在性能上与大型专有模型相当,其学术可靠性系数(SRC)达到GPT-4o和Claude 3.5 Sonnet的68.75%。该框架在参数处理方面展现出高效性和鲁棒性,但在处理税收等领域的复杂监管文本时,其基本推理能力仍存在不足。