Understanding where and how emotions are represented in large-scale foundation models remains an open problem, particularly in multimodal affective settings. Despite the strong empirical performance of recent affective models, the internal architectural mechanisms that support affective understanding and generation are still poorly understood. In this work, we present a systematic mechanistic study of affective modeling in multimodal foundation models. Across multiple architectures, training strategies, and affective tasks, we analyze how emotion-oriented supervision reshapes internal model parameters. Our results consistently reveal a clear and robust pattern: affective adaptation does not primarily focus on the attention module, but instead localizes to the feed-forward gating projection (\texttt{gate\_proj}). Through controlled module transfer, targeted single-module adaptation, and destructive ablation, we further demonstrate that \texttt{gate\_proj} is sufficient, efficient, and necessary for affective understanding and generation. Notably, by tuning only approximately 24.5\% of the parameters tuned by AffectGPT, our approach achieves 96.6\% of its average performance across eight affective tasks, highlighting substantial parameter efficiency. Together, these findings provide empirical evidence that affective capabilities in foundation models are structurally mediated by feed-forward gating mechanisms and identify \texttt{gate\_proj} as a central architectural locus of affective modeling.
翻译:理解大规模基础模型中情感表征的位置与方式仍是一个开放性问题,在多模态情感场景中尤为如此。尽管近期情感模型展现出强大的实证性能,但支持情感理解与生成的内部架构机制仍鲜为人知。本工作对多模态基础模型中的情感建模进行了系统的机制性研究。通过分析多种架构、训练策略及情感任务,我们探究了情感导向的监督如何重塑模型内部参数。结果一致揭示了一个清晰而稳健的模式:情感适配并非主要聚焦于注意力模块,而是定位于前馈门控投影(\texttt{gate\_proj})。通过受控模块迁移、定向单模块适配及破坏性消融实验,我们进一步证明\texttt{gate\_proj}对于情感理解与生成具有充分性、高效性和必要性。值得注意的是,通过仅调整约24.5%的AffectGPT所调参数,我们的方法在八项情感任务中平均达到了其96.6%的性能,凸显了显著的参数效率。综上,这些发现为“基础模型中的情感能力在结构上受前馈门控机制调节”提供了实证依据,并确立了\texttt{gate\_proj}作为情感建模的核心架构位点。