Music is an inherently social activity that allows people to share experiences and feel connected with one another. There has been little progress in designing artificial partners exhibiting a similar social experience as playing with another person. Neural network architectures that implement generative models, such as large language models, are suited for producing musical scores. Playing music socially, however, involves more than playing a score; it must complement the other musicians' ideas and keep time correctly. We addressed the question of whether a convincing social experience is made possible by a generative model trained to produce musical scores, not necessarily optimized for synchronization and continuation. The network, a variational autoencoder trained on a large corpus of digital scores, was adapted for a timed call-and-response task with a human partner. Participants played piano with a human or artificial partner-in various configurations-and rated the performance quality and first-person experience of self-other integration. Overall, the artificial partners held promise but were rated lower than human partners. The artificial partner with simplest design and highest similarity parameter was not rated differently from the human partners on some measures, suggesting that interactive rather than generative sophistication is important in enabling social AI.
翻译:音乐本质上是一种社交活动,让人们能够分享体验并感受彼此间的联系。然而,在设计能展现与人类合奏相似社交体验的人工智能伙伴方面,进展甚微。实现生成模型(如大型语言模型)的神经网络架构适合创作乐谱。但社交性演奏远不止于按谱演奏;它需要配合其他音乐家的想法并保持正确的节奏。本研究探讨了一个核心问题:一个主要训练用于生成乐谱(而非专门优化同步与续奏功能)的生成模型,能否实现令人信服的社交体验。该网络采用变分自编码器架构,基于大规模数字乐谱数据集训练,并针对与人类伙伴的计时问答任务进行适配。参与者分别与人类或人工智能伙伴(以不同配置)进行钢琴合奏,并对演奏质量及自我-他人整合的第一人称体验进行评分。总体而言,人工智能伙伴展现出潜力,但评分低于人类伙伴。值得注意的是,设计最简单且相似性参数最高的人工智能伙伴在某些指标上的评分与人类伙伴无显著差异,这表明交互机制的复杂性比生成机制的复杂性对实现社交型人工智能更为关键。