While Emotion Recognition in Conversation (ERC) has achieved high accuracy, two critical gaps remain: a limited understanding of \textit{which} architectural choices actually matter, and a lack of linguistic analysis connecting recognition to generation. We address both gaps through a systematic analysis of the IEMOCAP dataset. For recognition, we conduct a rigorous ablation study with 10-seed evaluation and report three key findings. First, conversational context is paramount, with performance saturating rapidly -- 90\% of the total gain achieved within just the most recent 10--30 preceding turns (depending on the label set). Second, hierarchical sentence representations help at utterance-level, but this benefit disappears once conversational context is provided, suggesting that context subsumes intra-utterance structure. Third, external affective lexicons (SenticNet) provide no gain, indicating that pre-trained encoders already capture necessary emotional semantics. With simple architectures using strictly causal context, we achieve 82.69\% (4-way) and 67.07\% (6-way) weighted F1, outperforming prior text-only methods including those using bidirectional context. For linguistic analysis, we analyze 5,286 discourse marker occurrences and find a significant association between emotion and marker positioning ($p < .0001$). Notably, "sad" utterances exhibit reduced left-periphery marker usage (21.9\%) compared to other emotions (28--32\%), consistent with theories linking left-periphery markers to active discourse management. This connects to our recognition finding that sadness benefits most from context (+22\%p): lacking explicit pragmatic signals, sad utterances require conversational history for disambiguation.
翻译:尽管对话情感识别(ERC)已实现较高准确率,但仍存在两个关键缺陷:对何种架构选择真正重要的理解有限,以及缺乏连接识别与生成的语言学分析。通过对IEMOCAP数据集的系统性分析,我们同时解决了这两个问题。在识别方面,我们进行了包含10次随机种子评估的严格消融实验,并报告了三个关键发现。首先,对话上下文至关重要,其性能提升迅速饱和——90%的总增益仅通过最近10-30轮对话即可实现(具体轮数取决于标签集)。其次,分层句子表示在话语层面具有帮助,但一旦提供对话上下文后该优势即消失,这表明上下文已包含话语内部结构信息。第三,外部情感词典(SenticNet)未带来增益,说明预训练编码器已捕获必要的情绪语义。通过使用严格因果上下文的简单架构,我们实现了82.69%(四分类)和67.07%(六分类)的加权F1值,超越了包括使用双向上下文的先前纯文本方法。在语言分析方面,我们分析了5,286个话语标记实例,发现情绪与标记位置存在显著关联(p < .0001)。值得注意的是,与其他情绪(28-32%)相比,“悲伤”话语的左边缘标记使用率显著降低(21.9%),这与左边缘标记关联于主动话语管理的理论一致。这关联到我们的识别发现:悲伤情绪从上下文中获益最大(+22%个百分点)——由于缺乏显性语用信号,悲伤话语需要依赖对话历史进行消歧。