Ensuring the reliability and effectiveness of software release decisions is critical, particularly in safety-critical domains like automotive systems. Precise analysis of release validation data, often presented in tabular form, plays a pivotal role in this process. However, traditional methods that rely on manual analysis of extensive test datasets and validation metrics are prone to delays and high costs. Large Language Models (LLMs) offer a promising alternative but face challenges in analytical reasoning, contextual understanding, handling out-of-scope queries, and processing structured test data consistently; limitations that hinder their direct application in safety-critical scenarios. This paper introduces GateLens, an LLM-based tool for analyzing tabular data in the automotive domain. GateLens translates natural language queries into Relational Algebra (RA) expressions and then generates optimized Python code. It outperforms the baseline system on benchmarking datasets, achieving higher F1 scores and handling complex and ambiguous queries with greater robustness. Ablation studies confirm the critical role of the RA module, with performance dropping sharply when omitted. Industrial evaluations reveal that GateLens reduces analysis time by over 80% while maintaining high accuracy and reliability. As demonstrated by presented results, GateLens achieved high performance without relying on few-shot examples, showcasing strong generalization across various query types from diverse company roles. Insights from deploying GateLens with a partner automotive company offer practical guidance for integrating AI into critical workflows such as release validation. Results show that by automating test result analysis, GateLens enables faster, more informed, and dependable release decisions, and can thus advance software scalability and reliability in automotive systems.
翻译:确保软件发布决策的可靠性与有效性至关重要,这在汽车系统等安全关键领域尤为突出。对通常以表格形式呈现的发布验证数据进行精确分析,在此过程中发挥着关键作用。然而,依赖人工分析海量测试数据集与验证指标的传统方法易导致延迟和高昂成本。大语言模型(LLMs)提供了一种前景广阔的替代方案,但在分析推理、上下文理解、处理超范围查询以及一致性地处理结构化测试数据方面仍面临挑战;这些局限阻碍了其在安全关键场景中的直接应用。本文提出GateLens,一种面向汽车领域表格数据分析的大语言模型工具。GateLens将自然语言查询转换为关系代数(RA)表达式,进而生成优化的Python代码。其在基准测试数据集上表现优于基线系统,获得了更高的F1分数,并以更强的鲁棒性处理复杂和模糊查询。消融研究证实了RA模块的关键作用,移除该模块后性能急剧下降。工业评估表明,GateLens在保持高准确性与可靠性的同时,将分析时间缩短80%以上。如展示结果所示,GateLens在不依赖少样本示例的情况下实现了高性能,展现出对不同公司角色各类查询类型的强大泛化能力。通过与合作伙伴汽车公司部署GateLens获得的实践洞察,为将人工智能集成至发布验证等关键工作流程提供了实用指导。结果表明,通过自动化测试结果分析,GateLens能够实现更快速、更明智且更可靠的发布决策,从而推动汽车系统软件的可扩展性与可靠性发展。