Semantic dimensions of sound have been playing a central role in understanding the nature of auditory sensory experience as well as the broader relation between perception, language, and meaning. Accordingly, and given the recent proliferation of large language models (LLMs), here we asked whether such models exhibit an organisation of perceptual semantics similar to those observed in humans. Specifically, we prompted ChatGPT, a chatbot based on a state-of-the-art LLM, to rate musical instrument sounds on a set of 20 semantic scales. We elicited multiple responses in separate chats, analogous to having multiple human raters. ChatGPT generated semantic profiles that only partially correlated with human ratings, yet showed robust agreement along well-known psychophysical dimensions of musical sounds such as brightness (bright-dark) and pitch height (deep-high). Exploratory factor analysis suggested the same dimensionality but different spatial configuration of a latent factor space between the chatbot and human ratings. Unexpectedly, the chatbot showed degrees of internal variability that were comparable in magnitude to that of human ratings. Our work highlights the potential of LLMs to capture salient dimensions of human sensory experience.
翻译:声音的语义维度在理解听觉感官体验的本质以及感知、语言与意义之间的广泛关系中发挥着核心作用。鉴于此,并考虑到近年来大型语言模型(LLMs)的蓬勃发展,我们探究这类模型是否展现出与人类相似的感知语义组织。具体而言,我们基于最先进的LLM——ChatGPT——对多种乐器声音在20个语义量表上进行评分。我们通过多次独立对话引发响应,类似于招募多位人类评分者。ChatGPT生成的语义轮廓与人类评分仅部分相关,但在众所周知的音乐声音心理物理学维度(如明亮-暗淡、音高-低沉)上表现出稳健的一致性。探索性因子分析表明,聊天机器人与人类评分在潜在因子空间上具有相同的维度,但空间配置不同。出乎意料的是,聊天机器人内部变异性的程度与人类评分的变异性量级相当。本研究凸显了LLMs捕捉人类感官体验显著维度的潜力。