An Interpretable Closed-Loop Intelligent Tutoring System for Multimodal Affective Feedback in Asynchronous Presentation Training

This paper presents an interpretable closed-loop Intelligent Tutoring System (ITS) that supports feedback-guided practice for developing on-camera oral presentation skills at scale. The system operationalizes a seven-dimensional Behaviorally Anchored Rating Scale (BARS) and implements a three-layer interpretable feedback architecture that connects rubric-aligned multimodal scoring, audience-perceived expressive diagnostics, and retrieval-augmented conversational coaching to support deliberate practice. Built on an XGBoost backbone, the ITS maps multimodal inputs (facial, vocal, textual, and oculomotor features) into evidence-based feedback that can be traced back to observable performance cues. Trained on 10,360 Massive Open Online Course (MOOC) video segments, the system achieved rubric-aligned scoring with performance levels comparable to expert ratings (R2 = 0.48-0.61, Spearman's rho = 0.69-0.78, MAE = 0.43-0.57). In a pre-post validation study with 204 adult learners over a 30-day practice window, participants demonstrated significant improvements across all seven BARS dimensions (Cohen's d = 0.39-0.90), with practice frequency showing a strong positive association with posttest performance after controlling for baseline scores and demographics. The results demonstrate how multimodal analytic outputs can be systematically transformed into observable behavioral change through an integrated feedback architecture, advancing explainable and pedagogically grounded ITS design for performance-based competencies.

翻译：本文提出了一种可解释的闭环智能辅导系统，该系统支持大规模反馈引导式练习，以培养面对镜头的口头演讲技能。该系统操作了一个七维的行为锚定评级量表，并实现了一个三层可解释反馈架构，该架构连接了与量表对齐的多模态评分、观众感知的表达性诊断以及检索增强的对话式辅导，以支持刻意练习。该系统基于XGBoost主干，将多模态输入（面部、语音、文本和眼动特征）映射为基于证据的反馈，并且这些反馈可以追溯到可观察的表现线索。该系统在10,360个大规模开放在线课程视频片段上进行训练，实现了与量表对齐的评分，其表现水平与专家评分相当（R²=0.48–0.61，Spearman's rho=0.69–0.78，MAE=0.43–0.57）。在一项包含204名成年学习者的前后测验证研究中，经过30天的练习窗口期，参与者在所有七个BARS维度上均表现出显著提升（Cohen's d=0.39–0.90），并且在控制基线分数和人口统计学变量后，练习频率与后测表现呈强正相关。研究结果展示了如何通过一个集成的反馈架构，将多模态分析输出系统性地转化为可观察的行为改变，从而推动基于表现能力的、可解释且具有教学基础的智能辅导系统设计。