Audio effects (FX) such as reverberation, distortion, modulation, and dynamic range processing play a pivotal role in shaping emotional responses during music listening. While prior studies have examined links between low-level audio features and affective perception, the systematic impact of audio FX on emotion remains underexplored. This work investigates how foundation models - large-scale neural architectures pretrained on multimodal data - can be leveraged to analyze these effects. Such models encode rich associations between musical structure, timbre, and affective meaning, offering a powerful framework for probing the emotional consequences of sound design techniques. By applying various probing methods to embeddings from deep learning models, we examine the complex, nonlinear relationships between audio FX and estimated emotion, uncovering patterns tied to specific effects and evaluating the robustness of foundation audio models. Our findings aim to advance understanding of the perceptual impact of audio production practices, with implications for music cognition, performance, and affective computing.
翻译:音频效果(如混响、失真、调制和动态范围处理)在塑造音乐聆听过程中的情感反应中起着关键作用。尽管先前的研究已探讨过低级音频特征与情感感知之间的联系,但音频效果对情感的系统性影响仍未得到充分探索。本研究探究如何利用基础模型——在多模态数据上预训练的大规模神经架构——来分析这些效果。此类模型编码了音乐结构、音色与情感意义之间的丰富关联,为探究声音设计技术的情感影响提供了强大框架。通过对深度学习模型嵌入表示应用多种探测方法,我们审视了音频效果与估计情感之间的复杂非线性关系,揭示了与特定效果相关的模式,并评估了基础音频模型的鲁棒性。我们的研究结果旨在增进对音频制作实践感知影响的理解,对音乐认知、表演和情感计算具有启示意义。