In recent years, various intelligent autonomous robots have begun to appear in daily life and production. Desktop-level robots are characterized by their flexible deployment, rapid response, and suitability for light workload environments. In order to meet the current societal demand for service robot technology, this study proposes using a miniaturized desktop-level robot (by ROS) as a carrier, locally deploying a natural language model (NLP-BERT), and integrating visual recognition (CV-YOLO) and speech recognition technology (ASR-Whisper) as inputs to achieve autonomous decision-making and rational action by the desktop robot. Three comprehensive experiments were designed to validate the robotic arm, and the results demonstrate excellent performance using this approach across all three experiments. In Task 1, the execution rates for speech recognition and action performance were 92.6% and 84.3%, respectively. In Task 2, the highest execution rates under the given conditions reached 92.1% and 84.6%, while in Task 3, the highest execution rates were 95.2% and 80.8%, respectively. Therefore, it can be concluded that the proposed solution integrating ASR, NLP, and other technologies on edge devices is feasible and provides a technical and engineering foundation for realizing multimodal desktop-level robots.
翻译:近年来,各类智能自主机器人开始出现在日常生活与生产活动中。桌面级机器人具有部署灵活、响应迅速、适用于轻量工作环境等特点。为满足当前社会对服务机器人技术的需求,本研究提出以小型化桌面级机器人(基于ROS)为载体,在本地部署自然语言模型(NLP-BERT),并结合视觉识别(CV-YOLO)与语音识别技术(ASR-Whisper)作为输入,实现桌面机器人的自主决策与合理行动。通过设计三项综合性实验对机械臂进行验证,结果表明该方法在所有实验中均表现出优异性能。在任务1中,语音识别与动作执行的完成率分别为92.6%与84.3%;在任务2中,给定条件下的最高完成率达到92.1%与84.6%;而在任务3中,最高完成率分别为95.2%与80.8%。因此可以得出结论:所提出的在边缘设备上集成ASR、NLP等技术的方案具有可行性,为实现多模态桌面级机器人提供了技术与工程基础。