Recently, electroencephalography (EEG) signals have been actively incorporated to decode brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI. Accordingly, endeavors have been focused on building EEG-based datasets from visual or textual single-modal stimuli. However, these datasets offer limited EEG epochs per category, and the complex semantics of stimuli presented to participants compromise their quality and fidelity in capturing precise brain activity. The study in neuroscience unveils that the relationship between visual and textual stimulus in EEG recordings provides valuable insights into the brain's ability to process and integrate multi-modal information simultaneously. Inspired by this, we propose a novel large-scale multi-modal dataset, named EIT-1M, with over 1 million EEG-image-text pairs. Our dataset is superior in its capacity of reflecting brain activities in simultaneously processing multi-modal information. To achieve this, we collected data pairs while participants viewed alternating sequences of visual-textual stimuli from 60K natural images and category-specific texts. Common semantic categories are also included to elicit better reactions from participants' brains. Meanwhile, response-based stimulus timing and repetition across blocks and sessions are included to ensure data diversity. To verify the effectiveness of EIT-1M, we provide an in-depth analysis of EEG data captured from multi-modal stimuli across different categories and participants, along with data quality scores for transparency. We demonstrate its validity on two tasks: 1) EEG recognition from visual or textual stimuli or both and 2) EEG-to-visual generation.
翻译:近年来,脑电图(EEG)信号被积极引入多模态人工智能中,用于解码大脑对视觉或文本刺激的活动,并实现物体识别。相应地,研究努力主要集中在构建基于视觉或文本单模态刺激的脑电数据集。然而,这些数据集每个类别提供的脑电时段有限,且呈现给参与者的刺激语义复杂,影响了其捕捉精确大脑活动的质量和保真度。神经科学研究揭示,脑电记录中视觉与文本刺激之间的关系,为理解大脑同时处理和整合多模态信息的能力提供了宝贵见解。受此启发,我们提出了一个名为EIT-1M的新型大规模多模态数据集,包含超过100万个脑电-图像-文本对。我们的数据集在反映大脑同时处理多模态信息活动的能力方面表现优异。为实现这一点,我们在参与者观看来自6万张自然图像和类别特定文本的交替视觉-文本刺激序列时收集数据对。数据集还包含常见语义类别,以引发参与者大脑更好的反应。同时,我们采用了基于响应的刺激时序设计,并在不同数据块和会话中重复刺激,以确保数据多样性。为验证EIT-1M的有效性,我们对从不同类别和参与者的多模态刺激中捕获的脑电数据进行了深入分析,并提供了数据质量评分以保证透明度。我们在两个任务上证明了其有效性:1)基于视觉、文本或两者结合的脑电识别;2)脑电到视觉图像的生成。