In-the-wild expression recognition persistently fails on a few rare emotions, and the standard explanation is class imbalance. Through a controlled multi-task study on two benchmarks, we show the failure is instead a property of affect geometry: the rare classes are degenerate on Russell's circumplex, and that degeneracy bounds what any loss or cost can achieve. Our instrument is a circumplex-cost optimal-transport term that prices expression confusions by their valence-arousal distance. The term improves the official score and expression macro-F1, but a control most studies omit shows the gain is not geometric: a uniform cost, equivalent to a generic confidence penalty, matches it on Aff-Wild2 (p=0.625) and significantly exceeds it on AffectNet (+0.057 over base, larger than the circumplex). What the geometry reshapes is the structure of the errors, making them affectively nearer the truth on Aff-Wild2 (p=0.031 against the uniform control), an effect that does not survive on AffectNet, where a visual confound at the far corner of the circumplex overwhelms it. The rare-class failure, by contrast, is stable across both datasets we examine: the degenerate pairs (anger-fear on Aff-Wild2, anger-contempt on AffectNet) resist frequency-based interventions, the transport term, and an action-unit-augmented cost built specifically to separate them. We conclude that progress on rare expressions requires representations that distinguish the classes, not supervision that reprices their confusions, and we provide the controls and metrics needed to tell the two apart.
翻译:在自然场景下的表情识别对少数稀有情绪的识别始终存在失败,标准解释是类别不平衡。通过对两个基准数据集进行受控多任务研究,我们表明这种失败反而是情感几何结构的属性:稀有类别在Russell圆周上呈现退化状态,且这种退化限制了任何损失函数或代价函数所能达到的性能。我们的工具是圆周代价最优传输项,它根据效价-唤醒度距离对表情混淆进行定价。该术语提升了官方评分和表情宏F1值,但多数研究忽略的对照实验表明这种提升并非几何性的:等同于通用置信度惩罚的均匀代价在Aff-Wild2上与之持平(p=0.625),在AffectNet上显著超越(相对于基线提升+0.057,大于圆周代价)。几何结构真正改变的是错误结构,使其在Aff-Wild2上情感距离更接近真实值(相对于均匀对照p=0.031),但这种效应在AffectNet上不成立,因为圆周远端角落的视觉混淆因素压制了该效应。相比之下,稀有类失败在我们检验的两个数据集中保持稳定:退化对(Aff-Wild2上的愤怒-恐惧,AffectNet上的愤怒-轻蔑)抵抗基于频率的干预、传输项以及专门为区分它们而构建的动作单元增强代价。我们的结论是:稀有表情的进展需要能够区分这些类别的表征,而非对混淆进行重新定价的监督机制,我们提供了区分两者所需的对照实验和评估指标。