Ensuring reliable data-driven decisions is crucial in domains where analytical accuracy directly impacts safety, compliance, or operational outcomes. Decision support in such domains relies on large tabular datasets, where manual analysis is slow, costly, and error-prone. While Large Language Models (LLMs) offer promising automation potential, they face challenges in analytical reasoning, structured data handling, and ambiguity resolution. This paper introduces GateLens, an LLM-based architecture for reliable analysis of complex tabular data. Its key innovation is the use of Relational Algebra (RA) as a formal intermediate representation between natural-language reasoning and executable code, addressing the reasoning-to-code gap that can arise in direct generation approaches. In our automotive instantiation, GateLens translates natural language queries into RA expressions and generates optimized Python code. Unlike traditional multi-agent or planning-based systems that can be slow, opaque, and costly to maintain, GateLens emphasizes speed, transparency, and reliability. We validate the architecture in automotive software release analytics, where experimental results show that GateLens outperforms the existing Chain-of-Thought (CoT) + Self-Consistency (SC) based system on real-world datasets, particularly in handling complex and ambiguous queries. Ablation studies confirm the essential role of the RA layer. Industrial deployment demonstrates over 80% reduction in analysis time while maintaining high accuracy across domain-specific tasks. GateLens operates effectively in zero-shot settings without requiring few-shot examples or agent orchestration. This work advances deployable LLM system design by identifying key architectural features--intermediate formal representations, execution efficiency, and low configuration overhead--crucial for domain-specific analytical applications.
翻译:在分析准确性直接影响安全性、合规性或运营结果的领域中,确保可靠的数据驱动决策至关重要。此类领域的决策支持依赖于大型表格数据集,而手动分析缓慢、成本高昂且容易出错。尽管大型语言模型(LLMs)展现出有前景的自动化潜力,但它们在分析推理、结构化数据处理和歧义消解方面仍面临挑战。本文提出GateLens,一种基于LLM的架构,用于可靠分析复杂表格数据。其核心创新在于使用关系代数(RA)作为自然语言推理与可执行代码之间的形式化中间表示,以解决直接生成方法中可能出现的推理到代码的鸿沟。在我们的汽车领域实例中,GateLens将自然语言查询转换为RA表达式并生成优化的Python代码。与可能缓慢、不透明且维护成本高的传统多智能体或基于规划的系统不同,GateLens强调速度、透明度和可靠性。我们在汽车软件发布分析场景中验证该架构,实验结果表明,在真实数据集上,GateLens优于现有的基于思维链(CoT)与自洽性(SC)的系统,尤其在处理复杂和模糊查询方面。消融研究证实了RA层的关键作用。工业部署显示,在保持特定领域任务高准确性的同时,分析时间减少了80%以上。GateLens在零样本设置中有效运行,无需少量样本示例或智能体编排。本研究通过识别关键架构特征——中间形式化表示、执行效率和低配置开销——推进了可部署LLM系统的设计,这些特征对于特定领域分析应用至关重要。