Speech is essential for human communication, yet millions of people face impairments such as dysarthria, stuttering, and aphasia conditions that often lead to social isolation and reduced participation. Despite recent progress in automatic speech recognition (ASR) and text-to-speech (TTS) technologies, accessible web and mobile infrastructures for users with impaired speech remain limited, hindering the practical adoption of these advances in daily communication. To bridge this gap, we present SpeechAgent, a mobile SpeechAgent designed to facilitate people with speech impairments in everyday communication. The system integrates large language model (LLM)- driven reasoning with advanced speech processing modules, providing adaptive support tailored to diverse impairment types. To ensure real-world practicality, we develop a structured deployment pipeline that enables real-time speech processing on mobile and edge devices, achieving imperceptible latency while maintaining high accuracy and speech quality. Evaluation on real-world impaired speech datasets and edge-device latency profiling confirms that SpeechAgent delivers both effective and user-friendly performance, demonstrating its feasibility for personalized, day-to-day assistive communication.
翻译:言语是人类交流的基础,然而全球数百万人面临构音障碍、口吃、失语症等言语障碍,这些状况常导致社交孤立与参与度降低。尽管自动语音识别与文本转语音技术近年来取得进展,面向言语障碍用户的可用网络与移动基础设施仍然有限,阻碍了这些进步在日常交流中的实际应用。为弥合这一差距,我们提出SpeechAgent——一款旨在辅助言语障碍者日常交流的移动系统。该系统将大语言模型驱动的推理能力与先进语音处理模块相结合,针对不同障碍类型提供自适应支持。为确保实际可用性,我们开发了结构化部署流程,支持在移动及边缘设备上实现实时语音处理,在保持高准确率与语音质量的同时达到难以察觉的延迟。基于真实世界言语障碍数据集的评估及边缘设备延迟测试表明,SpeechAgent在提供高效能的同时兼具用户友好性,验证了其作为个性化日常辅助沟通工具的可行性。