Text emotion detection constitutes a crucial foundation for advancing artificial intelligence from basic comprehension to the exploration of emotional reasoning. Most existing emotion detection datasets rely on manual annotations, which are associated with high costs, substantial subjectivity, and severe label imbalances. This is particularly evident in the inadequate annotation of micro-emotions and the absence of emotional intensity representation, which fail to capture the rich emotions embedded in sentences and adversely affect the quality of downstream task completion. By proposing an all-labels and training-set label regression method, we map label values to energy intensity levels, thereby fully leveraging the learning capabilities of machine models and the interdependencies among labels to uncover multiple emotions within samples. This led to the establishment of the Emotion Quantization Network (EQN) framework for micro-emotion detection and annotation. Using five commonly employed sentiment datasets, we conducted comparative experiments with various models, validating the broad applicability of our framework within NLP machine learning models. Based on the EQN framework, emotion detection and annotation are conducted on the GoEmotions dataset. A comprehensive comparison with the results from Google literature demonstrates that the EQN framework possesses a high capability for automatic detection and annotation of micro-emotions. The EQN framework is the first to achieve automatic micro-emotion annotation with energy-level scores, providing strong support for further emotion detection analysis and the quantitative research of emotion computing.
翻译:文本情感检测构成了推动人工智能从基础理解迈向情感推理探索的关键基础。现有情感检测数据集大多依赖人工标注,存在成本高昂、主观性强及标签严重不平衡等问题。这在微情绪标注不足和情感强度表征缺失方面尤为明显,导致无法捕捉句子中蕴含的丰富情感,进而影响下游任务完成质量。通过提出全标签与训练集标签回归方法,我们将标签值映射至能量强度层级,从而充分利用机器学习模型的学习能力及标签间的相互依赖关系,以挖掘样本中的多重情感。由此建立了用于微情绪检测与标注的情感量化网络(EQN)框架。基于五个常用情感数据集,我们与多种模型进行了对比实验,验证了该框架在NLP机器学习模型中的广泛适用性。依托EQN框架,我们在GoEmotions数据集上进行了情感检测与标注。与谷歌文献结果的综合对比表明,EQN框架具备强大的微情绪自动检测与标注能力。该框架首次实现了带能级评分的自动微情绪标注,为情感检测分析的深化及情感计算的量化研究提供了有力支撑。