Audio effects (FX) such as reverberation, distortion, modulation, and dynamic range processing play a pivotal role in shaping emotional responses during music listening. While prior studies have examined links between low-level audio features and affective perception, the systematic impact of audio FX on emotion remains underexplored. This work investigates how foundation models - large-scale neural architectures pretrained on multimodal data - can be leveraged to analyze these effects. Such models encode rich associations between musical structure, timbre, and affective meaning, offering a powerful framework for probing the emotional consequences of sound design techniques. By applying various probing methods to embeddings from deep learning models, we examine the complex, nonlinear relationships between audio FX and estimated emotion, uncovering patterns tied to specific effects and evaluating the robustness of foundation audio models. Our findings aim to advance understanding of the perceptual impact of audio production practices, with implications for music cognition, performance, and affective computing.
翻译:混响、失真、调制和动态范围处理等音频效果在塑造音乐聆听过程中的情感反应方面起着关键作用。尽管先前的研究已探讨了低级音频特征与情感感知之间的联系,但音频效果对情感的系统性影响仍未得到充分探索。本研究探讨如何利用基础模型——即通过多模态数据预训练的大规模神经架构——来分析这些效果。此类模型编码了音乐结构、音色和情感意义之间的丰富关联,为探究声音设计技术的情感效应提供了一个强大的框架。通过对深度学习模型的嵌入表示应用多种探测方法,我们研究了音频效果与估计情感之间复杂的非线性关系,揭示了与特定效果相关的模式,并评估了基础音频模型的鲁棒性。我们的研究结果旨在深化对音频制作实践感知影响的理解,对音乐认知、表演和情感计算领域具有启示意义。