High-quality teacher-child interaction (TCI) is fundamental to early childhood development, yet traditional expert-based assessment faces a critical scalability challenge. In large systems like China's-serving 36 million children across 250,000+ kindergartens-the cost and time requirements of manual observation make continuous quality monitoring infeasible, relegating assessment to infrequent episodic audits that limit timely intervention and improvement tracking. In this paper, we investigate whether AI can serve as a scalable assessment teammate by extracting structured quality indicators and validating their alignment with human expert judgments. Our contributions include: (1) TEPE-TCI-370h (Tracing Effective Preschool Education), the first large-scale dataset of naturalistic teacher-child interactions in Chinese preschools (370 hours, 105 classrooms) with standardized ECQRS-EC and SSTEW annotations; (2) We develop Interaction2Eval, a specialized LLM-based framework addressing domain-specific challenges-child speech recognition, Mandarin homophone disambiguation, and rubric-based reasoning-achieving up to 88% agreement; (3) Deployment validation across 43 classrooms demonstrating an 18x efficiency gain in the assessment workflow, highlighting its potential for shifting from annual expert audits to monthly AI-assisted monitoring with targeted human oversight. This work not only demonstrates the technical feasibility of scalable, AI-augmented quality assessment but also lays the foundation for a new paradigm in early childhood education-one where continuous, inclusive, AI-assisted evaluation becomes the engine of systemic improvement and equitable growth.
翻译:高质量的师幼互动是幼儿发展的基础,但传统依赖专家的评估面临关键的可扩展性挑战。在中国这样服务于3600万儿童、拥有超过25万所幼儿园的大型体系中,人工观察的成本和时间要求使得持续质量监控难以实现,评估沦为不频繁的阶段性抽查,限制了及时干预和改进追踪。本文研究AI能否通过提取结构化质量指标并验证其与人类专家判断的一致性,作为可扩展的评估搭档。我们的贡献包括:(1)TEPE-TCI-370h(追踪有效学前教育),首个中国幼儿园自然情境师幼互动的大规模数据集(370小时,105个教室),配有标准化的ECQRS-EC和SSTEW标注;(2)开发了Interaction2Eval,一种专门的大语言模型框架,解决领域特定挑战——儿童语音识别、普通话同音词消歧和基于量规的推理——达到高达88%的一致性;(3)在43个教室进行的部署验证表明,评估工作流程效率提升了18倍,凸显其从年度专家审计转向月度AI辅助监控并辅以针对性人工监督的潜力。这项工作不仅证明了可扩展的AI增强质量评估的技术可行性,也为学前教育新范式奠定了基础——在这一范式中,持续、包容的AI辅助评估成为系统性改进和公平发展的引擎。