Tapping in a Remote Vehicle's onboard LLM to Complement the Ego Vehicle's Field-of-View

Today's advanced automotive systems are turning into intelligent Cyber-Physical Systems (CPS), bringing computational intelligence to their cyber-physical context. Such systems power advanced driver assistance systems (ADAS) that observe a vehicle's surroundings for their functionality. However, such ADAS have clear limitations in scenarios when the direct line-of-sight to surrounding objects is occluded, like in urban areas. Imagine now automated driving (AD) systems that ideally could benefit from other vehicles' field-of-view in such occluded situations to increase traffic safety if, for example, locations about pedestrians can be shared across vehicles. Current literature suggests vehicle-to-infrastructure (V2I) via roadside units (RSUs) or vehicle-to-vehicle (V2V) communication to address such issues that stream sensor or object data between vehicles. When considering the ongoing revolution in vehicle system architectures towards powerful, centralized processing units with hardware accelerators, foreseeing the onboard presence of large language models (LLMs) to improve the passengers' comfort when using voice assistants becomes a reality. We are suggesting and evaluating a concept to complement the ego vehicle's field-of-view (FOV) with another vehicle's FOV by tapping into their onboard LLM to let the machines have a dialogue about what the other vehicle ``sees''. Our results show that very recent versions of LLMs, such as GPT-4V and GPT-4o, understand a traffic situation to an impressive level of detail, and hence, they can be used even to spot traffic participants. However, better prompts are needed to improve the detection quality and future work is needed towards a standardised message interchange format between vehicles.

翻译：当今先进的汽车系统正演变为智能信息物理系统（CPS），为其信息物理环境带来计算智能。此类系统为高级驾驶辅助系统（ADAS）提供支持，后者通过观测车辆周围环境实现其功能。然而，当对周围物体的直接视线被遮挡时（例如在城市区域），此类ADAS存在明显局限性。试想自动驾驶（AD）系统若能在此类遮挡场景中借助其他车辆的视野——例如通过跨车辆共享行人位置信息——将有望提升交通安全。现有文献提出通过路侧单元（RSU）进行车-路（V2I）通信或车-车（V2V）通信来解决此类问题，实现车辆间传感器数据或物体数据的流式传输。考虑到当前车辆系统架构正朝着配备硬件加速器的强大中央处理单元演进，车载大语言模型（LLM）通过提升语音助手使用体验来改善乘客舒适度已成为可能。我们提出并评估了一种创新概念：通过接入他车车载LLM，让机器就他车“所见”进行对话，从而用他车视野补充本车视野（FOV）。实验结果表明，GPT-4V、GPT-4o等最新版LLM能以令人印象深刻的细节理解交通场景，甚至可用于识别交通参与者。然而，仍需优化提示词以提升检测质量，未来研究需致力于建立标准化的车辆间信息交换格式。

相关内容

TAP

关注 819

ACM应用感知TAP(ACM Transactions on Applied Perception)旨在通过发表有助于统一这些领域研究的高质量论文来增强计算机科学与心理学/感知之间的协同作用。该期刊发表跨学科研究，在跨计算机科学和感知心理学的任何主题领域都具有重大而持久的价值。所有论文都必须包含感知和计算机科学两个部分。主题包括但不限于：视觉感知：计算机图形学，科学/数据/信息可视化，数字成像，计算机视觉，立体和3D显示技术。听觉感知：听觉显示和界面，听觉听觉编码，空间声音，语音合成和识别。触觉：触觉渲染，触觉输入和感知。感觉运动知觉：手势输入，身体运动输入。感官感知：感官整合，多模式渲染和交互。官网地址：http://dblp.uni-trier.de/db/journals/tap/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日