Brain-computer interfaces aim to decode naturalistic stimuli from neural signals, yet most progress to date has focused on vision and language. In this article, we study a more challenging but far less explored setting, EEG-to-music reconstruction, where signals are weak, distributed, and highly susceptible to noise and channel variability. Our central finding is that early channel mixing destroys weak but discriminative EEG signals. To address this, we propose a channel-oriented design with three key components. Specifically, channel-wise tokenization treats each electrode as an explicit token to retain spatially localized neural evidence, channel-wise multi-view self-distillation enforces consistency across temporal crops and random channel subsets to learn robust and distributed representations, and channel-wise data augmentation introduces structured channel dropout to improve invariance to noise, artifacts, and missing electrodes. Together, these components preserve weak yet informative signals across channels and enable stable alignment to a semantic music representation space. We integrate this channel-oriented design within an encoding-alignment-decoding pipeline for EEG-to-music reconstruction. Theoretically, we characterize when preserving channel-level structure leads to improved alignment. Empirically, we compare with a range of state-of-the-art baselines and demonstrate consistent and significant performance gains.
翻译:脑机接口旨在从神经信号中解码自然刺激,但迄今为止的进展主要集中于视觉和语言领域。本文研究了一个更具挑战性却远未充分探索的场景——脑电信号到音乐的重建,其中信号微弱、分布广泛且极易受噪声和通道变异性影响。我们的核心发现是:早期通道混合会破坏微弱但具有判别性的脑电信号。为解决这一问题,我们提出了一种面向通道的设计方案,包含三个关键组件。具体而言,逐通道令牌化将每个电极视为显式令牌以保留空间局部的神经证据;逐通道多视角自蒸馏通过时间裁剪和随机通道子集强制一致性约束,从而学习鲁棒且分布式的表示;逐通道数据增强引入结构化通道丢弃以提升对噪声、伪迹和缺失电极的鲁棒性。这些组件共同保留跨通道的微弱信息信号,并实现与语义音乐表示空间的稳定对齐。我们将该面向通道的设计集成到编码-对齐-解码流水线中,用于脑电信号到音乐的重建。理论上,我们刻画了保留通道级结构能够改善对齐的条件;实验上,我们与一系列最先进基线方法进行比较,并展示了持续且显著的性能提升。