Most Multimodal Sentiment Analysis research has focused on point-wise regression. While straightforward, this approach is sensitive to label noise and neglects whether one sample is more positive than another, resulting in unstable predictions and poor correlation alignment. Pairwise ordinal learning frameworks emerged to address this gap, capturing relative order by learning from comparisons. Yet, they introduce two new trade-offs: First, they assign uniform importance to all comparisons, failing to adaptively focus on hard-to-rank samples. Second, they employ static ranking margins, which fail to reflect the varying semantic distances between sentiment groups. To address this, we propose a Two-Stage Group-wise Ranking and Calibration Framework (GRCF) that adapts the philosophy of Group Relative Policy Optimization (GRPO). Our framework resolves these trade-offs by simultaneously preserving relative ordinal structure, ensuring absolute score calibration, and adaptively focusing on difficult samples. Specifically, Stage 1 introduces a GRPO-inspired Advantage-Weighted Dynamic Margin Ranking Loss to build a fine-grained ordinal structure. Stage 2 then employs an MAE-driven objective to align prediction magnitudes. To validate its generalizability, we extend GRCF to classification tasks, including multimodal humor detection and sarcasm detection. GRCF achieves state-of-the-art performance on core regression benchmarks, while also showing strong generalizability in classification tasks.
翻译:多数多模态情感分析研究聚焦于点对点回归方法。该方法虽直观,但对标签噪声敏感,且忽略了样本间相对积极程度的比较,导致预测不稳定及关联对齐性差。为弥补这一缺陷,出现了成对序数学习框架,通过比较学习捕捉相对顺序。然而,这些框架引入了两个新的权衡问题:首先,它们对所有比较赋予均等重要性,未能自适应聚焦于难排序样本;其次,它们采用静态排序边界,无法反映情感组间变化的语义距离。为此,我们提出一种两阶段分组排序与校准框架(GRCF),该框架借鉴了分组相对策略优化(GRPO)的思想。我们的框架通过同时保持相对序数结构、确保绝对分数校准以及自适应聚焦困难样本,有效解决了上述权衡问题。具体而言,第一阶段引入受GRPO启发的优势加权动态边界排序损失,以构建细粒度序数结构;第二阶段则采用基于平均绝对误差的目标函数对齐预测幅度。为验证其泛化能力,我们将GRCF扩展至分类任务,包括多模态幽默检测与讽刺检测。GRCF在核心回归基准测试中取得了最先进的性能,同时在分类任务中展现出强大的泛化能力。