This paper presents an AI glasses system that integrates real-time voice processing, artificial intelligence(AI) agents, and cross-network streaming capabilities. The system employs dual-agent architecture where Agent 01 handles Automatic Speech Recognition (ASR) and Agent 02 manages AI processing through local Large Language Models (LLMs), Model Context Protocol (MCP) tools, and Retrieval-Augmented Generation (RAG). The system supports real-time RTSP streaming for voice and video data transmission, eye tracking data collection, and remote task execution through RabbitMQ messaging. Implementation demonstrates successful voice command processing with multilingual support and cross-platform task execution capabilities.
翻译:本文提出一种集成实时语音处理、人工智能(AI)智能体与跨网络流传输能力的AI眼镜系统。该系统采用双智能体架构:智能体01负责自动语音识别(ASR),智能体02通过本地大语言模型(LLMs)、模型上下文协议(MCP)工具及检索增强生成(RAG)技术管理AI处理流程。系统支持实时RTSP流传输以实现音视频数据传输、眼动追踪数据采集,并通过RabbitMQ消息队列实现远程任务执行。实验验证表明,该系统能成功处理多语言支持的语音指令,并具备跨平台任务执行能力。