Understanding speaker confidence is crucial in educational settings, as it can enhance personalised feedback and improve learning outcomes. This study introduces a novel framework for detecting speaker confidence by integrating human-engineered features with embeddings from the Whisper encoder. To address data limitations, a pseudo-labelling technique is employed to expand the labelled dataset, allowing the model to learn from both human-annotated and model-generated labels. The framework combines traditional speech features including pitch, volume, rate of speech, and the presence of disfluencies and stress, with Whisper embeddings, and uses a co-attention mechanism to fuse these representations and achieve an overall accuracy of 75%. This study contributes to advancing speech analysis, enabling applications that support personalised learning and speaking skill development.
翻译:理解说话者的置信度在教育环境中至关重要,因为它能增强个性化反馈并改善学习效果。本研究提出了一种新颖的框架,通过将人工设计的特征与Whisper编码器的嵌入相结合来检测说话者的置信度。为了解决数据限制问题,采用伪标签技术扩展已标注数据集,使模型能够从人工标注和模型生成的标签中学习。该框架将传统语音特征(包括音高、音量、语速以及不流畅和重音的出现)与Whisper嵌入相结合,并利用协同注意力机制融合这些表示,实现了75%的总体准确率。本研究推动了语音分析的发展,为支持个性化学习和口语技能培养的应用提供了助力。