基于大型语言模型的自动驾驶车辆通信外部人机交互界面动作自动化设计 (Automating eHMI Action Design with LLMs for Automated Vehicle Communication)

The absence of explicit communication channels between automated vehicles (AVs) and other road users requires the use of external Human-Machine Interfaces (eHMIs) to convey messages effectively in uncertain scenarios. Currently, most eHMI studies employ predefined text messages and manually designed actions to perform these messages, which limits the real-world deployment of eHMIs, where adaptability in dynamic scenarios is essential. Given the generalizability and versatility of large language models (LLMs), they could potentially serve as automated action designers for the message-action design task. To validate this idea, we make three contributions: (1) We propose a pipeline that integrates LLMs and 3D renderers, using LLMs as action designers to generate executable actions for controlling eHMIs and rendering action clips. (2) We collect a user-rated Action-Design Scoring dataset comprising a total of 320 action sequences for eight intended messages and four representative eHMI modalities. The dataset validates that LLMs can translate intended messages into actions close to a human level, particularly for reasoning-enabled LLMs. (3) We introduce two automated raters, Action Reference Score (ARS) and Vision-Language Models (VLMs), to benchmark 18 LLMs, finding that the VLM aligns with human preferences yet varies across eHMI modalities.

翻译：自动驾驶车辆（AVs）与其他道路使用者之间缺乏显式通信渠道，这要求在不确定场景中使用外部人机交互界面（eHMIs）来有效传递信息。当前大多数eHMI研究采用预定义文本信息和人工设计的动作来执行这些信息传递，这限制了eHMI在现实世界中的部署——而在动态场景中的适应性恰恰至关重要。鉴于大型语言模型（LLMs）的泛化能力和多功能性，它们可能成为信息-动作设计任务的自动化动作设计器。为验证这一设想，我们做出三项贡献：（1）我们提出一个集成LLMs与3D渲染器的流程框架，利用LLMs作为动作设计器来生成可执行动作以控制eHMIs，并渲染动作片段。（2）我们构建了一个用户评分的动作设计评分数据集，包含针对八种目标信息和四种典型eHMI形态的共计320个动作序列。该数据集验证了LLMs能够将目标信息转化为接近人类水平的动作设计，特别是具备推理能力的LLMs表现更为突出。（3）我们引入两种自动化评估器——动作参考评分（ARS）和视觉语言模型（VLMs），对18种LLMs进行基准测试，发现VLM与人类偏好具有较高一致性，但其评估表现会因eHMI形态的不同而产生差异。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日