Enriching the quality of early childhood education with interactive math learning at home systems, empowered by recent advances in conversational AI technologies, is slowly becoming a reality. With this motivation, we implement a multimodal dialogue system to support play-based learning experiences at home, guiding kids to master basic math concepts. This work explores Spoken Language Understanding (SLU) pipeline within a task-oriented dialogue system developed for Kid Space, with cascading Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) components evaluated on our home deployment data with kids going through gamified math learning activities. We validate the advantages of a multi-task architecture for NLU and experiment with a diverse set of pretrained language representations for Intent Recognition and Entity Extraction tasks in the math learning domain. To recognize kids' speech in realistic home environments, we investigate several ASR systems, including the commercial Google Cloud and the latest open-source Whisper solutions with varying model sizes. We evaluate the SLU pipeline by testing our best-performing NLU models on noisy ASR output to inspect the challenges of understanding children for math learning in authentic homes.
翻译:借助近期对话式人工智能技术的进步,通过家庭互动式数学学习系统提升早期教育质量正逐渐成为现实。基于这一动机,我们实现了一个多模态对话系统,以支持居家游戏化学习体验,引导儿童掌握基础数学概念。本研究探索了专为Kid Space开发的任务导向型对话系统中的口语理解(SLU)流程,通过级联自动语音识别(ASR)和自然语言理解(NLU)组件,在儿童参与游戏化数学学习活动的家庭部署数据上进行评估。我们验证了NLU多任务架构的优势,并在数学学习领域中针对意图识别和实体抽取任务,实验了多种预训练语言表示模型。为识别真实家庭环境中儿童的语音,我们研究了多种ASR系统,包括商业级谷歌云及不同规模的开源Whisper解决方案。通过测试最优NLU模型在含噪ASR输出中的表现,我们评估了SLU流程,从而揭示真实家庭场景下儿童数学学习语音理解面临的挑战。