We address two persistent gaps in Emotion Recognition in Conversation: which modeling choices materially affect performance, and how recognition findings connect to interpretable discourse-level patterns. We study both through a systematic investigation on IEMOCAP with cross-dataset validation on MELD. For recognition, we run controlled ablations with 10 random seeds and paired significance tests with multiple-comparisons correction, yielding three findings. First, conversational context is the dominant factor, but performance saturates quickly: roughly 90% of the gain is captured within the most recent 10-30 preceding turns, depending on the label set. Second, hierarchical sentence representations help most in utterance-only settings and show a clear advantage on MELD, but their benefit disappears once turn-level context is available, suggesting that conversational history subsumes much of the intra-utterance structure. Third, integrating an external affective lexicon does not improve results, consistent with pretrained encoders already capturing most of the affective signal needed for ERC. Under a strictly causal setting, our simple models achieve strong performance (82.69% 4-way; 67.07% 6-way weighted F1), showing that competitive accuracy is achievable without future turns. For linguistic analysis, we examine 5,286 discourse-marker occurrences and find a reliable association between emotion and marker position (p < .0001). Sad utterances show reduced left-periphery marker usage (21.9%) relative to other emotions (28-32%), consistent with accounts linking left-periphery markers to active discourse management. This aligns with our recognition results, where Sad benefits most from conversational context (+22 percentage points), suggesting sadness may be more context-dependent than emotions with stronger local pragmatic cues.
翻译:我们解决了对话情绪识别中两个长期存在的空白:哪些建模选择显著影响性能,以及识别结果如何与可解释的话语层面模式相关联。我们通过系统研究,在IEMOCAP上进行实验,并在MELD上进行跨数据集验证。对于识别任务,我们进行了10个随机种子的受控消融实验,并采用多重比较校正的配对显著性检验,得出三点发现。首先,对话上下文是主导因素,但性能迅速饱和:大约90%的性能提升来自最近的10-30个对话轮次(取决于标签集)。其次,层级句子表示在仅考虑话语的设置中帮助最大,并在MELD上表现出明显优势,但一旦引入轮次级别的上下文,其优势消失,这表明对话历史已涵盖了大部分话语内部结构。第三,整合外部情感词典并未提升结果,这与预训练编码器已捕获ERC所需的大部分情感信号一致。在严格的因果设置下,我们的简单模型实现了强性能(4分类加权F1为82.69%;6分类加权F1为67.07%),表明无需未来轮次即可达到竞争性准确率。在语言学分析中,我们检验了5286个话语标记出现,发现情绪与标记位置之间存在可靠关联(p < .0001)。悲伤话语的左边缘标记使用率(21.9%)低于其他情绪(28-32%),这与将左边缘标记与主动话语管理关联的解释一致。这符合我们的识别结果,其中悲伤从对话上下文中获益最大(+22个百分点),表明悲伤可能比具有更强局部语用线索的情绪更依赖上下文。