In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. We additionally perform song classification based on the generated tracks. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.
翻译:本文探讨了利用隐扩散模型这一强大的生成模型家族,从脑电图记录中重建自然音乐任务的潜力。与音色有限、较为简单的音乐(如MIDI生成的曲调或单音音乐)不同,本文关注的是包含多种乐器、人声和效果,具有丰富谐波和音色的复杂音乐。本研究代表了利用非侵入性脑电图数据实现高质量通用音乐重建的初步尝试,采用端到端的训练方法直接在原始数据上进行,无需人工预处理和通道选择。我们在公开的NMED-T数据集上训练模型,并提出了基于神经嵌入的指标进行定量评估。此外,我们还基于生成的音轨进行了歌曲分类。我们的工作为神经解码和脑机接口的持续研究做出了贡献,为利用脑电图数据重建复杂听觉信息的可行性提供了见解。