Online learning is a rapidly growing industry due to its convenience. However, a major challenge in online learning is whether students are as engaged as they are in face-to-face classes. An engagement recognition system can significantly improve the learning experience in online classes. Current challenges in engagement detection involve poor label quality in the dataset, intra-class variation, and extreme data imbalance. To address these problems, we present the CMOSE dataset, which contains a large number of data in different engagement levels and high-quality labels generated according to the psychological advice. We demonstrate the advantage of transferability by analyzing the model performance on other engagement datasets. We also developed a training mechanism, MocoRank, to handle the intra-class variation, the ordinal relationship between different classes, and the data imbalance problem. MocoRank outperforms prior engagement detection losses, achieving a 1.32% enhancement in overall accuracy and 5.05% improvement in average accuracy. We further demonstrate the effectiveness of multi-modality by conducting ablation studies on features such as pre-trained video features, high-level facial features, and audio features.
翻译:在线学习因其便利性而成为快速发展的行业。然而,在线学习的主要挑战在于学生是否能够像面对面课堂中一样保持参与。参与度识别系统可以显著提升在线课堂的学习体验。当前参与度检测面临的挑战包括数据集中标签质量低下、类内差异以及极端数据不平衡问题。为解决这些问题,我们提出了CMOSE数据集,该数据集包含大量不同参与度等级的数据,并根据心理学建议生成高质量标签。通过分析模型在其他参与度数据集上的表现,我们展示了其可迁移性的优势。我们还开发了名为MocoRank的训练机制,以处理类内差异、不同类别间的序数关系以及数据不平衡问题。MocoRank优于先前的参与度检测损失函数,在总体准确率上提升1.32%,平均准确率提升5.05%。通过针对预训练视频特征、高级面部特征和音频特征等特征进行消融研究,我们进一步证明了多模态的有效性。