EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More

Recently, electroencephalography (EEG) signals have been actively incorporated to decode brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI. Accordingly, endeavors have been focused on building EEG-based datasets from visual or textual single-modal stimuli. However, these datasets offer limited EEG epochs per category, and the complex semantics of stimuli presented to participants compromise their quality and fidelity in capturing precise brain activity. The study in neuroscience unveils that the relationship between visual and textual stimulus in EEG recordings provides valuable insights into the brain's ability to process and integrate multi-modal information simultaneously. Inspired by this, we propose a novel large-scale multi-modal dataset, named EIT-1M, with over 1 million EEG-image-text pairs. Our dataset is superior in its capacity of reflecting brain activities in simultaneously processing multi-modal information. To achieve this, we collected data pairs while participants viewed alternating sequences of visual-textual stimuli from 60K natural images and category-specific texts. Common semantic categories are also included to elicit better reactions from participants' brains. Meanwhile, response-based stimulus timing and repetition across blocks and sessions are included to ensure data diversity. To verify the effectiveness of EIT-1M, we provide an in-depth analysis of EEG data captured from multi-modal stimuli across different categories and participants, along with data quality scores for transparency. We demonstrate its validity on two tasks: 1) EEG recognition from visual or textual stimuli or both and 2) EEG-to-visual generation.

翻译：近年来，脑电图（EEG）信号被积极引入多模态人工智能中，用于解码大脑对视觉或文本刺激的活动，并实现物体识别。相应地，研究努力主要集中在构建基于视觉或文本单模态刺激的脑电数据集。然而，这些数据集每个类别提供的脑电时段有限，且呈现给参与者的刺激语义复杂，影响了其捕捉精确大脑活动的质量和保真度。神经科学研究揭示，脑电记录中视觉与文本刺激之间的关系，为理解大脑同时处理和整合多模态信息的能力提供了宝贵见解。受此启发，我们提出了一个名为EIT-1M的新型大规模多模态数据集，包含超过100万个脑电-图像-文本对。我们的数据集在反映大脑同时处理多模态信息活动的能力方面表现优异。为实现这一点，我们在参与者观看来自6万张自然图像和类别特定文本的交替视觉-文本刺激序列时收集数据对。数据集还包含常见语义类别，以引发参与者大脑更好的反应。同时，我们采用了基于响应的刺激时序设计，并在不同数据块和会话中重复刺激，以确保数据多样性。为验证EIT-1M的有效性，我们对从不同类别和参与者的多模态刺激中捕获的脑电数据进行了深入分析，并提供了数据质量评分以保证透明度。我们在两个任务上证明了其有效性：1）基于视觉、文本或两者结合的脑电识别；2）脑电到视觉图像的生成。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日