Reading fluency assessment is a critical component of literacy programmes, serving to guide and monitor early education interventions. Given the resource intensive nature of the exercise when conducted by teachers, the development of automatic tools that can operate on audio recordings of oral reading is attractive as an objective and highly scalable solution. Multiple complex aspects such as accuracy, rate and expressiveness underlie human judgements of reading fluency. In this work, we investigate end-to-end modeling on a training dataset of children's audio recordings of story texts labeled by human experts. The pre-trained wav2vec2.0 model is adopted due its potential to alleviate the challenges from the limited amount of labeled data. We report the performance of a number of system variations on the relevant measures, and also probe the learned embeddings for lexical and acoustic-prosodic features known to be important to the perception of reading fluency.
翻译:阅读流利度评估是识字计划的关键组成部分,用于指导和监测早期教育干预措施。鉴于由教师执行该评估时具有资源密集的特性,开发能够基于口语阅读录音自动运行的工具,成为一种客观且高度可扩展的解决方案,颇具吸引力。人类对阅读流利度的判断基于准确性、速度和表现力等多个复杂方面。在本研究中,我们基于一个由儿童故事文本录音组成、并由人类专家标注的训练数据集,进行端到端建模研究。由于预训练的wav2vec2.0模型具备缓解标注数据量有限所带来挑战的潜力,我们采用了该模型。我们报告了多种系统变体在相关度量指标上的性能,并探究了所学嵌入中对于阅读流利度感知至关重要的词汇及声学-韵律特征。