Today's advanced automotive systems are turning into intelligent Cyber-Physical Systems (CPS), bringing computational intelligence to their cyber-physical context. Such systems power advanced driver assistance systems (ADAS) that observe a vehicle's surroundings for their functionality. However, such ADAS have clear limitations in scenarios when the direct line-of-sight to surrounding objects is occluded, like in urban areas. Imagine now automated driving (AD) systems that ideally could benefit from other vehicles' field-of-view in such occluded situations to increase traffic safety if, for example, locations about pedestrians can be shared across vehicles. Current literature suggests vehicle-to-infrastructure (V2I) via roadside units (RSUs) or vehicle-to-vehicle (V2V) communication to address such issues that stream sensor or object data between vehicles. When considering the ongoing revolution in vehicle system architectures towards powerful, centralized processing units with hardware accelerators, foreseeing the onboard presence of large language models (LLMs) to improve the passengers' comfort when using voice assistants becomes a reality. We are suggesting and evaluating a concept to complement the ego vehicle's field-of-view (FOV) with another vehicle's FOV by tapping into their onboard LLM to let the machines have a dialogue about what the other vehicle ``sees''. Our results show that very recent versions of LLMs, such as GPT-4V and GPT-4o, understand a traffic situation to an impressive level of detail, and hence, they can be used even to spot traffic participants. However, better prompts are needed to improve the detection quality and future work is needed towards a standardised message interchange format between vehicles.
翻译:当今先进的汽车系统正演变为智能信息物理系统(CPS),为其信息物理环境带来计算智能。此类系统为高级驾驶辅助系统(ADAS)提供支持,后者通过观测车辆周围环境实现其功能。然而,当对周围物体的直接视线被遮挡时(例如在城市区域),此类ADAS存在明显局限性。试想自动驾驶(AD)系统若能在此类遮挡场景中借助其他车辆的视野——例如通过跨车辆共享行人位置信息——将有望提升交通安全。现有文献提出通过路侧单元(RSU)进行车-路(V2I)通信或车-车(V2V)通信来解决此类问题,实现车辆间传感器数据或物体数据的流式传输。考虑到当前车辆系统架构正朝着配备硬件加速器的强大中央处理单元演进,车载大语言模型(LLM)通过提升语音助手使用体验来改善乘客舒适度已成为可能。我们提出并评估了一种创新概念:通过接入他车车载LLM,让机器就他车“所见”进行对话,从而用他车视野补充本车视野(FOV)。实验结果表明,GPT-4V、GPT-4o等最新版LLM能以令人印象深刻的细节理解交通场景,甚至可用于识别交通参与者。然而,仍需优化提示词以提升检测质量,未来研究需致力于建立标准化的车辆间信息交换格式。