The integration of large language models (LLMs) with social robots has emerged as a promising avenue for enhancing human-robot interactions at a time when news reports generated by artificial intelligence (AI) are gaining in credibility. This integration is expected to intensify and become a more productive resource for journalism, media, communication, and education. In this paper a novel system is proposed that integrates AI's generative pretrained transformer (GPT) model with the Pepper robot, with the aim of improving the robot's natural language understanding and response generation capabilities for enhanced social interactions. By leveraging GPT's powerful language processing capabilities, this system offers a comprehensive pipeline that incorporates voice input recording, speech-to-text transcription, context analysis, and text-to-speech synthesis action generation. The Pepper robot is enabled to comprehend user queries, generate informative responses with general knowledge, maintain contextually relevant conversations, and act as a more domain-oriented news reporter. It is also linked with a news resource and powered with a Google search capability. To evaluate the performance of the framework, experiments were conducted involving a set of diverse questions. The robot's responses were assessed on the basis of eight criteria, including relevance, context, and fluency. Despite some identified limitations, this system contributes to the field of journalism and human-robot interaction by showcasing the potential of integrating LLMs with social robots. The proposed framework opens up opportunities for improving the conversational capabilities of robots, enabling interactions that are smoother, more engaging, and more context aware.
翻译:大语言模型(LLMs)与社交机器人的集成,在人工智能(AI)生成的新闻报道日益获得可信度的当下,已成为增强人机交互的一个有前景的途径。这种集成预计将进一步加强,并成为新闻、媒体、传播及教育领域更具生产力的资源。本文提出了一种新颖系统,将AI的生成式预训练Transformer(GPT)模型与Pepper机器人集成,旨在提升机器人的自然语言理解与响应生成能力,以增强社交交互。通过利用GPT强大的语言处理能力,该系统提供了涵盖语音输入录制、语音转文本转录、上下文分析及文本转语音合成动作生成的完整流水线。Pepper机器人能够理解用户查询、借助通用知识生成信息丰富的回答、维持上下文相关的对话,并作为更具领域导向性的新闻记者运作。该系统还与新闻资源相连,并具备谷歌搜索能力。为评估该框架的性能,进行了一系列涉及多样化问题的实验。根据相关性、上下文及流畅性等八项标准评估了机器人的响应。尽管存在若干已识别的局限性,该系统通过展示LLMs与社交机器人集成的潜力,为新闻学与人机交互领域做出了贡献。所提出的框架为提升机器人的对话能力开辟了机遇,使交互更加流畅、更具参与性且更富上下文感知。