This paper presents the "Speak & Improve Challenge 2025: Spoken Language Assessment and Feedback" -- a challenge associated with the ISCA SLaTE 2025 Workshop. The goal of the challenge is to advance research on spoken language assessment and feedback, with tasks associated with both the underlying technology and language learning feedback. Linked with the challenge, the Speak & Improve (S&I) Corpus 2025 is being pre-released, a dataset of L2 learner English data with holistic scores and language error annotation, collected from open (spontaneous) speaking tests on the Speak & Improve learning platform. The corpus consists of approximately 315 hours of audio data from second language English learners with holistic scores, and a 55-hour subset with manual transcriptions and error labels. The Challenge has four shared tasks: Automatic Speech Recognition (ASR), Spoken Language Assessment (SLA), Spoken Grammatical Error Correction (SGEC), and Spoken Grammatical Error Correction Feedback (SGECF). Each of these tasks has a closed track where a predetermined set of models and data sources are allowed to be used, and an open track where any public resource may be used. Challenge participants may do one or more of the tasks. This paper describes the challenge, the S&I Corpus 2025, and the baseline systems released for the Challenge.
翻译:本文介绍了“Speak & Improve Challenge 2025:口语评估与反馈”——一项与ISCA SLaTE 2025研讨会关联的挑战赛。该挑战赛的目标是推动口语评估与反馈领域的研究,其任务既涉及底层技术,也涉及语言学习反馈。与挑战赛关联,Speak & Improve (S&I) Corpus 2025数据集正在预发布,这是一个包含整体分数和语言错误标注的二语学习者英语数据集,数据收集自Speak & Improve学习平台上的开放式(即兴)口语测试。该语料库包含约315小时来自英语作为第二语言学习者的音频数据及整体分数,其中55小时的子集包含人工转录文本和错误标签。本次挑战赛设有四项共享任务:自动语音识别(ASR)、口语语言评估(SLA)、口语语法错误纠正(SGEC)以及口语语法错误纠正反馈(SGECF)。每项任务均设有一个封闭赛道,仅允许使用预先确定的模型和数据源;以及一个开放赛道,允许使用任何公开资源。挑战赛参与者可选择完成一项或多项任务。本文描述了该挑战赛、S&I Corpus 2025语料库以及为挑战赛发布的基线系统。