Voice assistants (VAs) are typically evaluated through task performance metrics and self-report questionnaires, but people's voices themselves carry rich paralinguistic cues that reveal affect, effort, and interaction breakdowns. We present a within-subjects study (N=49) that systematically compared three VA personas across three usage scenarios to investigate whether speech-derived audio features can serve as a proxy for user experience (UX). Participants' speech was analyzed for temporal, spectral, and linguistic markers, alongside standardized UX measures, brief mood and stress ratings, and a post-study questionnaire. We found correlations between specific speech features and self-reported satisfaction and experience. Furthermore, a machine learning model trained on speech features achieved promising accuracy in classifying UX levels, indicating that this might be a reasonable alternative to self-report instruments. Our findings establish speech as a viable, real-time signal for implicitly measuring UX and point toward adaptive VUIs that respond dynamically to emotional and usability-related vocal cues.
翻译:语音助手(VAs)通常通过任务性能指标和自我报告问卷进行评估,但人们的语音本身携带着丰富的副语言线索,这些线索揭示了情感、努力以及交互故障。我们进行了一项被试内研究(N=49),系统比较了三种使用场景下的三种VA人格,以探究语音衍生的音频特征是否可以作为用户体验(UX)的代理指标。参与者的语音被分析了时间、频谱和语言标记,同时结合了标准化的UX测量、简短的情绪和压力评分以及一项研究后问卷。我们发现特定语音特征与自我报告的满意度和体验之间存在相关性。此外,基于语音特征训练的机器学习模型在分类UX水平方面取得了令人鼓舞的准确性,表明这可能是一种合理的替代自我报告工具的方法。我们的研究结果确立了语音作为一种可行的、实时的信号来隐式测量UX,并指向了能够动态响应用户情感和可用性相关声音线索的自适应VUI。