Advancements in Large Language Models (LLMs) have opened transformative possibilities for human-robot interaction, especially in collaborative environments. However, Real-time human-AI collaboration requires agents to adapt to unseen human behaviors while maintaining effective communication dynamically. Existing benchmarks fall short in evaluating such adaptability for embodied agents, focusing mostly on the task performance of the agent itself. To address this gap, we propose a novel benchmark that assesses agents' reactive adaptability and instantaneous communication capabilities at every step. Based on this benchmark, we propose a Monitor-then-Adapt framework (MonTA), combining strong adaptability and communication with real-time execution. MonTA contains three key LLM modules, a lightweight \textit{Monitor} for monitoring the need for adaptation in high frequency, and two proficient \textit{Adapters} for subtask and path adaptation reasoning in low frequency. Our results demonstrate that MonTA outperforms other baseline agents on our proposed benchmark. Further user studies confirm the high reasonability adaptation plan and consistent language instruction provided by our framework.
翻译:大型语言模型(LLM)的进步为人机交互,尤其是在协作环境中,开启了变革性的可能性。然而,实时的人机协作要求智能体能够适应未见的人类行为,同时动态地保持有效的通信。现有的基准测试在评估具身智能体的这种适应能力方面存在不足,主要关注智能体自身的任务性能。为了弥补这一差距,我们提出了一个新颖的基准测试,用于在每一步评估智能体的反应式适应能力和即时通信能力。基于此基准,我们提出了一个"监控-适应"框架(MonTA),该框架将强大的适应能力、通信能力与实时执行相结合。MonTA包含三个关键的LLM模块:一个轻量级的\textit{监控器},用于高频监控适应需求;以及两个熟练的\textit{适配器},用于低频的子任务和路径适应推理。我们的结果表明,MonTA在我们提出的基准测试中优于其他基线智能体。进一步的用户研究证实了我们的框架提供了高合理性的适应计划和一致的语言指令。