Architecture Matters More Than Scale: A Comparative Study of Retrieval and Memory Augmentation for Financial QA Under SME Compute Constraints

from arxiv, Accepted at the 2026 6th International Conference on Artificial Intelligence and Industrial Technology Applications (AIITA 2026), to be published by IEEE. 12 pages, 5 figures

The rapid adoption of artificial intelligence (AI) and large language models (LLMs) is transforming financial analytics by enabling natural language interfaces for reporting, decision support, and automated reasoning. However, limited empirical understanding exists regarding how different LLM-based reasoning architectures perform across realistic financial workflows, particularly under the cost, accuracy, and compliance constraints faced by small and medium-sized enterprises (SMEs). SMEs typically operate within severe infrastructure constraints, lacking cloud GPU budgets, dedicated AI teams, and API-scale inference capacity, making architectural efficiency a first-class concern. To ensure practical relevance, we introduce an explicit SME-constrained evaluation setting in which all experiments are conducted using a locally hosted 8B-parameter instruction-tuned model without cloud-scale infrastructure. This design isolates the impact of architectural choices within a realistic deployment environment. We systematically compare four reasoning architectures: baseline LLM, retrieval-augmented generation (RAG), structured long-term memory, and memory-augmented conversational reasoning across both FinQA and ConvFinQA benchmarks. Results reveal a consistent architectural inversion: structured memory improves precision in deterministic, operand-explicit tasks, while retrieval-based approaches outperform memory-centric methods in conversational, reference-implicit settings. Based on these findings, we propose a hybrid deployment framework that dynamically selects reasoning strategies to balance numerical accuracy, auditability, and infrastructure efficiency, providing a practical pathway for financial AI adoption in resource-constrained environments.

翻译：人工智能（AI）与大语言模型（LLM）的快速普及正通过自然语言界面推动金融分析领域的变革，涵盖报告生成、决策支持及自动化推理等应用场景。然而，关于不同LLM推理架构在真实金融工作流中的表现差异，尤其在中小企业面临的成本、准确性与合规性约束下的实际效能，目前尚缺乏系统性实证研究。中小企业通常受限于严重的算力基础设施瓶颈：缺乏云端图形处理器预算、专属AI团队及大规模推理接口资源，这使得架构效率成为首要考量因素。为确保研究具有实践相关性，我们专门构建了中小企业约束评估框架——所有实验均基于本地部署的80亿参数指令调优模型，完全摒弃云端基础设施。该设计可在真实部署环境中隔离架构选择带来的影响差异。我们系统对比了四种推理架构：基线LLM、检索增强生成（RAG）、结构化长程记忆及记忆增强对话推理，并在FinQA与ConvFinQA两个基准数据集上展开测试。实验结果揭示出显著的"架构逆效应"：结构化记忆在确定性显式操作数任务中提升精度，而检索式方法在涉及指代隐含关系的对话场景中优于记忆增强方案。基于上述发现，我们提出混合部署框架，通过动态选择推理策略平衡数值精度、可审计性与基础设施效率，为资源受限环境下的金融AI落地提供实践路径。