The commencement of the decade brought along with it a grave pandemic and in response the movement of education forums predominantly into the online world. With a surge in the usage of online video conferencing platforms and tools to better gauge student understanding, there needs to be a mechanism to assess whether instructors can grasp the extent to which students understand the subject and their response to the educational stimuli. The current systems consider only a single cue with a lack of focus in the educational domain. Thus, there is a necessity for the measurement of an all-encompassing holistic overview of the students' reaction to the subject matter. This paper highlights the need for a multimodal approach to affect recognition and its deployment in the online classroom while considering four cues, posture and gesture, facial, eye tracking and verbal recognition. It compares the various machine learning models available for each cue and provides the most suitable approach given the available dataset and parameters of classroom footage. A multimodal approach derived from weighted majority voting is proposed by combining the most fitting models from this analysis of individual cues based on accuracy, ease of procuring data corpus, sensitivity and any major drawbacks.
翻译:本十年伊始,一场严重的疫情席卷全球,教育论坛也随之大规模转向线上。随着在线视频会议平台和用于更好评估学生理解程度的工具使用激增,亟需建立一种机制来评估教师能否把握学生对知识的理解程度及其对教育刺激的反应。现有系统仅考虑单一线索,且缺乏对教育领域的专注。因此,有必要对学生对课程内容的反应进行全方位整体评估。本文强调了多模态方法在情感识别中的必要性及其在线课堂中的应用,同时考虑了姿态与手势、面部表情、眼动追踪和语音识别四种线索。本文比较了每种线索可用的各类机器学习模型,并根据现有数据集和课堂录像参数提供了最合适的方法。基于加权多数投票原则,我们提出了一种多模态方法,该方法通过结合对单一线索的精度、数据语料获取难度、敏感性及主要缺陷等维度的分析,选取最适配的模型进行融合。