This paper aims to address a critical challenge in robotics, which is enabling them to operate seamlessly in human environments through natural language interactions. Our primary focus is to equip robots with the ability to understand and execute complex instructions in coherent dialogs to facilitate intricate task-solving scenarios. To explore this, we build upon the Execution from Dialog History (EDH) task from the Teach benchmark. We employ a multi-transformer model with BART LM. We observe that our best configuration outperforms the baseline with a success rate score of 8.85 and a goal-conditioned success rate score of 14.02. In addition, we suggest an alternative methodology for completing this task. Moreover, we introduce a new task by expanding the EDH task and making predictions about game plans instead of individual actions. We have evaluated multiple BART models and an LLaMA2 LLM, which has achieved a ROGUE-L score of 46.77 for this task.
翻译:本文旨在解决机器人领域的关键挑战,即通过自然语言交互使机器人能够在人类环境中无缝运行。我们的核心目标是让机器人具备在连贯对话中理解和执行复杂指令的能力,以处理复杂的任务求解场景。为探索这一问题,我们基于Teach基准中的对话历史执行(EDH)任务展开研究,采用结合BART语言模型的多Transformer架构。实验表明,我们的最优配置在成功率上达到8.85分,目标条件成功率为14.02分,均优于基线模型。此外,我们提出了完成该任务的替代方法,并在扩展EDH任务的基础上引入新任务——通过预测任务计划而非单个动作来实现。我们评估了多个BART模型及LLaMA2大语言模型,其中LLaMA2在该任务上取得了46.77的ROGUE-L评分。