This paper presents VisioPhysioENet, a novel multimodal system that leverages visual and physiological signals to detect learner engagement. It employs a two-level approach for extracting both visual and physiological features. For visual feature extraction, Dlib is used to detect facial landmarks, while OpenCV provides additional estimations. The face recognition library, built on Dlib, is used to identify the facial region of interest specifically for physiological signal extraction. Physiological signals are then extracted using the plane-orthogonal-toskin method to assess cardiovascular activity. These features are integrated using advanced machine learning classifiers, enhancing the detection of various levels of engagement. We thoroughly tested VisioPhysioENet on the DAiSEE dataset. It achieved an accuracy of 63.09%. This shows it can better identify different levels of engagement compared to many existing methods. It performed 8.6% better than the only other model that uses both physiological and visual features.
翻译:本文提出了一种新颖的多模态系统VisioPhysioENet,该系统利用视觉和生理信号来检测学习者的参与度。它采用一种两级方法来提取视觉和生理特征。在视觉特征提取方面,使用Dlib检测面部关键点,同时利用OpenCV提供额外的估计。基于Dlib构建的人脸识别库用于识别面部感兴趣区域,专门用于生理信号提取。随后,采用皮肤正交平面法提取生理信号以评估心血管活动。这些特征通过先进的机器学习分类器进行融合,从而增强了对不同参与度水平的检测能力。我们在DAiSEE数据集上对VisioPhysioENet进行了全面测试,其准确率达到63.09%。这表明,与许多现有方法相比,该系统能更好地识别不同水平的参与度。其性能优于目前唯一同时使用生理和视觉特征的模型,提升幅度达8.6%。