Emotion detection is pivotal in human communication, as it significantly influences behavior, relationships, and decision-making processes. This study concentrates on text-based emotion detection by leveraging the GoEmotions dataset, which annotates Reddit comments with 27 distinct emotions. These emotions are subsequently mapped to Ekman's six basic categories: joy, anger, fear, sadness, disgust, and surprise. We employed a range of models for this task, including six machine learning models, three ensemble models, and a Long Short-Term Memory (LSTM) model to determine the optimal model for emotion detection. Results indicate that the Stacking classifier outperforms other models in accuracy and performance. We also benchmark our models against EmoBERTa, a pre-trained emotion detection model, with our Stacking classifier proving more effective. Finally, the Stacking classifier is deployed via a Streamlit web application, underscoring its potential for real-world applications in text-based emotion analysis.
翻译:情感检测在人际交流中至关重要,因其显著影响行为模式、社会关系与决策过程。本研究基于GoEmotions数据集开展文本情感检测分析,该数据集对Reddit评论标注了27种离散情感标签,并将其映射至Ekman的六类基本情感框架:喜悦、愤怒、恐惧、悲伤、厌恶与惊讶。我们采用六种机器学习模型、三种集成模型及长短期记忆(LSTM)模型进行系统评估,以确定最优情感检测方案。实验结果表明,Stacking分类器在准确率与综合性能上均优于其他模型。通过与预训练情感检测模型EmoBERTa的基准对比,本研究的Stacking分类器展现出更优性能。最终,我们通过Streamlit网络应用部署了该分类器,彰显了其在文本情感分析实际应用场景中的潜力。