Pitch is a fundamental aspect of auditory perception. Pitch perception is commonly described across two perceptual dimensions: pitch height is the sense that tones with varying frequencies seem to be higher or lower, and chroma equivalence is the cyclical similarity of notes octaves, corresponding to a doubling of fundamental frequency. Existing research is divided on whether chroma equivalence is a learned percept that varies according to musical experience and culture, or is an innate percept that develops automatically. Building on a recent framework that proposes to use ANNs to ask 'why' questions about the brain, we evaluated recent auditory ANNs using representational similarity analysis to test the emergence of pitch height and chroma equivalence in their learned representations. Additionally, we fine-tuned two models, Wav2Vec 2.0 and Data2Vec, on a self-supervised learning task using speech and music, and a supervised music transcription task. We found that all models exhibited varying degrees of pitch height representation, but that only models trained on the supervised music transcription task exhibited chroma equivalence. Mere exposure to music through self-supervised learning was not sufficient for chroma equivalence to emerge. This supports the view that chroma equivalence is a higher-order cognitive computation that emerges to support the specific task of music perception, distinct from other auditory perception such as speech listening. This work also highlights the usefulness of ANNs for probing the developmental conditions that give rise to perceptual representations in humans.
翻译:音高是听觉感知的基本维度。音高感知通常通过两个感知维度来描述:音高高度指不同频率音调听起来有高低之分的感觉,而音色等价性则是八度音阶中音符的循环相似性,对应基频的倍频关系。现有研究对音色等价性存在分歧:它究竟是随音乐经验和文化差异而习得的感知能力,还是自动发展的先天感知能力?基于近期提出的"使用人工神经网络探究大脑'为什么'问题"的研究框架,我们通过表征相似性分析评估了最新听觉人工神经网络,检验其习得表征中音高高度与音色等价性的涌现情况。此外,我们使用语音和音乐数据对Wav2Vec 2.0和Data2Vec两个模型进行了自监督学习任务的微调,并进行了监督式音乐转录任务的训练。研究发现:所有模型均表现出不同程度的音高高度表征,但仅接受监督式音乐转录任务训练的模型展现出音色等价性。单纯通过自监督学习接触音乐不足以促发音色等价性的涌现。这支持了以下观点:音色等价性是为支持音乐感知这一特定任务而产生的高阶认知计算,与语音聆听等其他听觉感知存在本质区别。本研究同时凸显了人工神经网络在探究人类感知表征发展条件方面的实用价值。