With the recent development of natural language generation models - termed as large language models (LLMs) - a potential use case has opened up to improve the way that humans interact with robot assistants. These LLMs should be able to leverage their large breadth of understanding to interpret natural language commands into effective, task appropriate and safe robot task executions. However, in reality, these models suffer from hallucinations, which may cause safety issues or deviations from the task. In other domains, these issues have been improved through the use of collaborative AI systems where multiple LLM agents can work together to collectively plan, code and self-check outputs. In this research, multiple collaborative AI systems were tested against a single independent AI agent to determine whether the success in other domains would translate into improved human-robot interaction performance. The results show that there is no defined trend between the number of agents and the success of the model. However, it is clear that some collaborative AI agent architectures can exhibit a greatly improved capacity to produce error-free code and to solve abstract problems.
翻译:随着自然语言生成模型——即大型语言模型(LLM)——的最新发展,一个潜在的用例已经开启,以改善人类与机器人助手交互的方式。这些LLM应当能够利用其广泛的理解能力,将自然语言指令解析为有效、任务适配且安全的机器人任务执行方案。然而在现实中,这些模型存在幻觉问题,可能导致安全隐患或任务执行偏差。在其他领域中,此类问题已通过采用协作式AI系统得到改善,其中多个LLM智能体可协同工作,共同规划、编码并自我检查输出。本研究通过对比多种协作式AI系统与单一独立AI智能体,探究其他领域的成功经验是否能转化为人机交互性能的提升。结果表明,智能体数量与模型成功率之间不存在明确关联趋势。然而可以明确的是,某些协作式AI智能体架构能够显著提升生成无错误代码及解决抽象问题的能力。