Large Language Models (LLMs) have garnered significant attention for their ability to understand text and images, generate human-like text, and perform complex reasoning tasks. However, their ability to generalize this advanced reasoning with a combination of natural language text for decision-making in dynamic situations requires further exploration. In this study, we investigate how well LLMs can adapt and apply a combination of arithmetic and common-sense reasoning, particularly in autonomous driving scenarios. We hypothesize that LLMs hybrid reasoning abilities can improve autonomous driving by enabling them to analyze detected object and sensor data, understand driving regulations and physical laws, and offer additional context. This addresses complex scenarios, like decisions in low visibility (due to weather conditions), where traditional methods might fall short. We evaluated Large Language Models (LLMs) based on accuracy by comparing their answers with human-generated ground truth inside CARLA. The results showed that when a combination of images (detected objects) and sensor data is fed into the LLM, it can offer precise information for brake and throttle control in autonomous vehicles across various weather conditions. This formulation and answers can assist in decision-making for auto-pilot systems.
翻译:大语言模型因其理解文本与图像、生成类人文本及执行复杂推理任务的能力而备受关注。然而,其将高级推理与自然语言文本相结合以在动态场景中做出决策的泛化能力仍需进一步探索。本研究探讨了大语言模型在自动驾驶场景中适应并运用算术推理与常识推理的混合能力。我们假设:大语言模型的混合推理能力可通过分析检测对象与传感器数据、理解交通法规与物理定律、提供额外上下文信息来提升自动驾驶性能。这解决了传统方法可能失效的复杂场景(如因天气条件导致的低能见度决策)。我们基于准确率评估大语言模型性能,将其输出与CARLA仿真环境中人工标注的真值进行对比。结果表明:当将图像(检测对象)与传感器数据组合输入大语言模型时,模型能在不同天气条件下为自动驾驶车辆的制动与油门控制提供精准信息。该公式化方法及其输出可辅助自动驾驶系统的决策过程。