In real-world scenarios, human dialogues are multi-round and diverse. Furthermore, human instructions can be unclear and human responses are unrestricted. Interactive robots face difficulties in understanding human intents and generating suitable strategies for assisting individuals through manipulation. In this article, we propose Mani-GPT, a Generative Pre-trained Transformer (GPT) for interactive robotic manipulation. The proposed model has the ability to understand the environment through object information, understand human intent through dialogues, generate natural language responses to human input, and generate appropriate manipulation plans to assist the human. This makes the human-robot interaction more natural and humanized. In our experiment, Mani-GPT outperforms existing algorithms with an accuracy of 84.6% in intent recognition and decision-making for actions. Furthermore, it demonstrates satisfying performance in real-world dialogue tests with users, achieving an average response accuracy of 70%.
翻译:在现实场景中,人类对话具有多轮次和多样性的特点。此外,人类指令可能含糊不清,而人类回应则不受限制。交互式机器人难以理解人类意图,并难以生成合适的策略来协助个体进行操作。在本文中,我们提出了Mani-GPT,一种用于交互式机器人操作的生成式预训练Transformer模型。该模型具备通过物体信息理解环境、通过对话理解人类意图、生成对用户输入的自然语言回应,以及生成合适的操作计划来协助人类的能力。这使得人机交互更加自然和人性化。在我们的实验中,Mani-GPT在意图识别和动作决策方面的准确率达到84.6%,优于现有算法。此外,在实际用户对话测试中,它表现出令人满意的性能,平均回应准确率达到70%。