Large Language Models (LLMs) are taking many industries by storm. They possess impressive reasoning capabilities and are capable of handling complex problems, as shown by their steadily improving scores on coding and mathematical benchmarks. However, are the models currently available truly capable of addressing real-world challenges, such as those found in the automotive industry? How well can they understand high-level, abstract instructions? Can they translate these instructions directly into functional code, or do they still need help and supervision? In this work, we put one of the current state-of-the-art models to the test. We evaluate its performance in the task of translating abstract requirements, extracted from automotive standards and documents, into configuration code for CARLA simulations.
翻译:大语言模型(LLMs)正在席卷众多行业。如其在编程与数学基准测试中持续提升的分数所示,这些模型展现出卓越的推理能力,能够处理复杂问题。然而,当前可用的模型是否真正具备应对现实挑战(例如汽车行业中的难题)的能力?它们对高层次抽象指令的理解程度如何?能否直接将此类指令转化为可执行代码,抑或仍需辅助与监督?本研究对当前最先进的模型之一进行测试,评估其将汽车标准与文档中提取的抽象需求转化为CARLA仿真配置代码的任务表现。