Traditional methods for making software deployment decisions in the automotive industry typically rely on manual analysis of tabular software test data. These methods often lead to higher costs and delays in the software release cycle due to their labor-intensive nature. Large Language Models (LLMs) present a promising solution to these challenges. However, their application generally demands multiple rounds of human-driven prompt engineering, which limits their practical deployment, particularly for industrial end-users who need reliable and efficient results. In this paper, we propose GoNoGo, an LLM agent system designed to streamline automotive software deployment while meeting both functional requirements and practical industrial constraints. Unlike previous systems, GoNoGo is specifically tailored to address domain-specific and risk-sensitive systems. We evaluate GoNoGo's performance across different task difficulties using zero-shot and few-shot examples taken from industrial practice. Our results show that GoNoGo achieves a 100% success rate for tasks up to Level 2 difficulty with 3-shot examples, and maintains high performance even for more complex tasks. We find that GoNoGo effectively automates decision-making for simpler tasks, significantly reducing the need for manual intervention. In summary, GoNoGo represents an efficient and user-friendly LLM-based solution currently employed in our industrial partner's company to assist with software release decision-making, supporting more informed and timely decisions in the release process for risk-sensitive vehicle systems.
翻译:汽车行业中传统的软件部署决策方法通常依赖于对表格化软件测试数据的人工分析。这些方法因其劳动密集型特性,往往导致软件发布周期成本增加和延迟。大型语言模型(LLMs)为应对这些挑战提供了一种前景广阔的解决方案。然而,其应用通常需要多轮人工驱动的提示工程,这限制了其实际部署,特别是对于需要可靠且高效结果的工业终端用户而言。本文提出GoNoGo,一种基于LLM的智能体系统,旨在简化汽车软件部署流程,同时满足功能需求和实际工业约束。与先前系统不同,GoNoGo专门针对领域特定和风险敏感系统进行定制。我们使用从工业实践中提取的零样本和少样本示例,在不同任务难度下评估GoNoGo的性能。结果表明,对于难度等级不超过2级的任务,GoNoGo在使用3样本示例时达到100%的成功率,即使在更复杂的任务中也保持高性能。我们发现GoNoGo能有效自动化较简单任务的决策,显著减少人工干预需求。总之,GoNoGo代表了一种高效且用户友好的基于LLM的解决方案,目前已在我们的工业合作伙伴公司中投入使用,以协助软件发布决策,支持风险敏感车辆系统在发布过程中做出更明智、更及时的决策。