自动情感分类器对男性表达的情绪识别不足 (Automatic Classifiers Underdetect Emotions Expressed by Men)

The widespread adoption of automatic sentiment and emotion classifiers makes it important to ensure that these tools perform reliably across different populations. Yet their reliability is typically assessed using benchmarks that rely on third-party annotators rather than the individuals experiencing the emotions themselves, potentially concealing systematic biases. In this paper, we use a unique, large-scale dataset of more than one million self-annotated posts and a pre-registered research design to investigate gender biases in emotion detection across 414 combinations of models and emotion-related classes. We find that across different types of automatic classifiers and various underlying emotions, error rates are consistently higher for texts authored by men compared to those authored by women. We quantify how this bias could affect results in downstream applications and show that current machine learning tools, including large language models, should be applied with caution when the gender composition of a sample is not known or variable. Our findings demonstrate that sentiment analysis is not yet a solved problem, especially in ensuring equitable model behaviour across demographic groups.

翻译：自动情感与情绪分类器的广泛采用，使得确保这些工具在不同群体中可靠运行变得至关重要。然而，其可靠性通常依赖于第三方标注者而非情绪体验者本人构建的基准进行评估，这可能掩盖了系统性偏差。本文利用一个独特的、包含超过一百万条自我标注帖子的大规模数据集，以及一项预先注册的研究设计，对414种模型与情绪相关类别组合中的性别偏差进行了研究。我们发现，在不同类型的自动分类器及各种基础情绪中，男性作者文本的错误率始终高于女性作者文本。我们量化了这种偏差如何影响下游应用的结果，并表明当样本的性别构成未知或可变时，当前包括大语言模型在内的机器学习工具应谨慎使用。我们的研究结果表明，情感分析尚未成为一个已解决的问题，尤其是在确保模型在不同人口群体间的公平行为方面。

相关内容

分类器

关注 6

分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型（即我们通常所说的分类器(Classifier)）。该函数或模型能够把数据库中的数据纪录映射到给定类别中的某一个，从而可以应用于数据预测。总之，分类器是数据挖掘中对样本进行分类的方法的统称，包含决策树、逻辑回归、朴素贝叶斯、神经网络等算法。

情感推荐系统综述：面向个性化的态度、情绪与情境建模

专知会员服务

17+阅读 · 2025年8月29日

为什么视觉嵌入语言模型在图像分类上表现差？

专知会员服务

22+阅读 · 2024年11月30日

【博士论文】《计算机视觉中潜在表示的不确定性》，66页pdf

专知会员服务

22+阅读 · 2024年8月28日

大型语言模型遇上文本中心的多模态情感分析：综述

专知会员服务

25+阅读 · 2024年6月13日