EmoMind: Decoding Affective Captions from Human Brain fMRI

Decoding visual experience from brain activity has advanced substantially, but current brain-to-text systems largely recover semantic content while discarding affect. Additionally, language models can generate emotional text when prompted with categorical labels, but such labels collapse rich inter-subject variability into coarse discrete bins. We present EmoMind, the first end-to-end pipeline for decoding affective captions directly from fMRI signals. EmoMind first retrieves a semantically grounded neutral scene description from brain-decoded visual features, then rewrites it using a continuous 34-dimensional emotion vector decoded from the same fMRI recording. To control the balance between content preservation and affective expression, we train the rewriter with classifier-free guidance against an identity-preserving null branch, enabling smooth interpolation between semantic fidelity and affective expressivity. We evaluate affective caption generation with a three-axis validation framework spanning subject-specificity, structural geometry, and causal control. We further augment this framework with a synthetic-brain substitution test that probes robustness to the measurement apparatus, and we benchmark each axis against GPT-4 prompted with brain-decoded top-5 emotion labels as a strong discrete baseline. Across two independent emotion fMRI datasets, EmoMind significantly outperforms label-prompted GPT-4 on all three axes, with the largest gains on metrics that require person-specific affective structure rather than population-level emotion aggregation. These results establish continuous brain-decoded affect as a viable control signal for individualized affective caption generation and open new directions for studying individual affective brain organisation.

翻译：从大脑活动中解码视觉体验已取得显著进展，但当前脑到文本系统主要恢复语义内容而忽略情感信息。此外，语言模型在接收到分类标签提示时可生成情感文本，但此类标签将丰富的个体间差异压缩为粗糙的离散类别。我们提出EmoMind——首个直接从fMRI信号解码情感性描述文本的端到端流水线。EmoMind首先从脑解码的视觉特征中检索语义中性的场景描述，然后使用从同一fMRI记录解码的连续34维情感向量对其进行改写。为控制内容保留与情感表达之间的平衡，我们采用无分类器引导机制训练改写器，使其区别于保持身份特征的零分支，从而实现语义保真度与情感表现力之间的平滑插值。我们通过包含主体特异性、结构几何性与因果控制性的三轴验证框架对情感性描述文本生成进行评估。进一步地，我们在此框架中引入合成脑替代测试以探测对测量设备的鲁棒性，并将各轴性能与基于脑解码前五情感标签提示的GPT-4（作为强离散基线）进行对比。在两个独立情感fMRI数据集上，EmoMind在所有三轴指标上均显著优于标签提示的GPT-4，其中在需要个体特异性情感结构而非群体级情感聚合的指标上提升最为显著。这些结果确立了连续脑解码情感作为个性化情感描述文本生成的有效控制信号，并为研究个体情感脑组织模式开辟了新方向。