In this paper, we present Speak Ease: an augmentative and alternative communication (AAC) system to support users' expressivity by integrating multimodal input, including text, voice, and contextual cues (conversational partner and emotional tone), with large language models (LLMs). Speak Ease combines automatic speech recognition (ASR), context-aware LLM-based outputs, and personalized text-to-speech technologies to enable more personalized, natural-sounding, and expressive communication. Through an exploratory feasibility study and focus group evaluation with speech and language pathologists (SLPs), we assessed Speak Ease's potential to enable expressivity in AAC. The findings highlight the priorities and needs of AAC users and the system's ability to enhance user expressivity by supporting more personalized and contextually relevant communication. This work provides insights into the use of multimodal inputs and LLM-driven features to improve AAC systems and support expressivity.
翻译:本文提出Speak Ease:一种增强与替代通信(AAC)系统,通过整合文本、语音和上下文线索(对话伙伴与情感语调)等多模态输入与大型语言模型(LLM),以支持用户的表达力。Speak Ease结合了自动语音识别(ASR)、基于上下文感知的LLM输出以及个性化文本转语音技术,从而实现更个性化、更自然且更具表现力的沟通。通过一项探索性可行性研究以及与言语语言病理学家(SLP)的焦点小组评估,我们评估了Speak Ease在AAC中实现表达力的潜力。研究结果突显了AAC用户的优先需求,以及该系统通过支持更个性化且与上下文相关的沟通来增强用户表达力的能力。这项工作为利用多模态输入和LLM驱动功能来改进AAC系统并支持表达力提供了见解。