Reading fluency assessment is a critical component of literacy programmes, serving to guide and monitor early education interventions. Given the resource intensive nature of the exercise when conducted by teachers, the development of automatic tools that can operate on audio recordings of oral reading is attractive as an objective and highly scalable solution. Multiple complex aspects such as accuracy, rate and expressiveness underlie human judgements of reading fluency. In this work, we investigate end-to-end modeling on a training dataset of children's audio recordings of story texts labeled by human experts. The pre-trained wav2vec2.0 model is adopted due its potential to alleviate the challenges from the limited amount of labeled data. We report the performance of a number of system variations on the relevant measures, and also probe the learned embeddings for lexical and acoustic-prosodic features known to be important to the perception of reading fluency.
翻译:阅读流利度评估是读写能力培养项目的关键组成部分,用于指导和监测早期教育干预措施。鉴于教师进行评估时该活动具有资源密集的特点,开发能够处理口语阅读录音的自动化工具成为一种具有吸引力且高度可扩展的客观解决方案。人类对阅读流利度的判断涉及准确性、速度和表现力等多个复杂维度。本研究基于由专家标注的儿童故事文本录音训练数据集,探索端到端建模方法。由于预训练的wav2vec2.0模型具备缓解标注数据量有限所带来挑战的潜力,我们采用该模型作为基础。我们报告了多种系统变体在相关评估指标上的性能表现,并探究了所学嵌入向量中与阅读流利度感知密切相关的词汇特征及声学-韵律特征。