When AI Meets Early Childhood Education: Large Language Models as Assessment Teammates in Chinese Preschools

High-quality teacher-child interaction (TCI) is fundamental to early childhood development, yet traditional expert-based assessment faces a critical scalability challenge. In large systems like China's-serving 36 million children across 250,000+ kindergartens-the cost and time requirements of manual observation make continuous quality monitoring infeasible, relegating assessment to infrequent episodic audits that limit timely intervention and improvement tracking. In this paper, we investigate whether AI can serve as a scalable assessment teammate by extracting structured quality indicators and validating their alignment with human expert judgments. Our contributions include: (1) TEPE-TCI-370h (Tracing Effective Preschool Education), the first large-scale dataset of naturalistic teacher-child interactions in Chinese preschools (370 hours, 105 classrooms) with standardized ECQRS-EC and SSTEW annotations; (2) We develop Interaction2Eval, a specialized LLM-based framework addressing domain-specific challenges-child speech recognition, Mandarin homophone disambiguation, and rubric-based reasoning-achieving up to 88% agreement; (3) Deployment validation across 43 classrooms demonstrating an 18x efficiency gain in the assessment workflow, highlighting its potential for shifting from annual expert audits to monthly AI-assisted monitoring with targeted human oversight. This work not only demonstrates the technical feasibility of scalable, AI-augmented quality assessment but also lays the foundation for a new paradigm in early childhood education-one where continuous, inclusive, AI-assisted evaluation becomes the engine of systemic improvement and equitable growth.

翻译：高质量的师幼互动是幼儿发展的基础，但传统依赖专家的评估面临关键的可扩展性挑战。在中国这样服务于3600万儿童、拥有超过25万所幼儿园的大型体系中，人工观察的成本和时间要求使得持续质量监控难以实现，评估沦为不频繁的阶段性抽查，限制了及时干预和改进追踪。本文研究AI能否通过提取结构化质量指标并验证其与人类专家判断的一致性，作为可扩展的评估搭档。我们的贡献包括：（1）TEPE-TCI-370h（追踪有效学前教育），首个中国幼儿园自然情境师幼互动的大规模数据集（370小时，105个教室），配有标准化的ECQRS-EC和SSTEW标注；（2）开发了Interaction2Eval，一种专门的大语言模型框架，解决领域特定挑战——儿童语音识别、普通话同音词消歧和基于量规的推理——达到高达88%的一致性；（3）在43个教室进行的部署验证表明，评估工作流程效率提升了18倍，凸显其从年度专家审计转向月度AI辅助监控并辅以针对性人工监督的潜力。这项工作不仅证明了可扩展的AI增强质量评估的技术可行性，也为学前教育新范式奠定了基础——在这一范式中，持续、包容的AI辅助评估成为系统性改进和公平发展的引擎。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

专知会员服务

28+阅读 · 2月27日

AI教育的落地深度研究：复盘、对比和商业化

专知会员服务

16+阅读 · 2025年4月3日

【斯坦福博士论文】大语言模型的AI辅助评估

专知会员服务

31+阅读 · 2025年3月30日

2024人工智能大模型的技术岗位与能力培养研究报告-中国软件行业协会，43页pdf

专知会员服务

85+阅读 · 2024年3月15日