The reconstruction of images observed by subjects from fMRI data collected during visual stimuli has made strong progress in the past decade, thanks to the availability of extensive fMRI datasets and advancements in generative models for image generation. However, the application of visual reconstruction has remained limited. Reconstructing visual imagination presents a greater challenge, with potentially revolutionary applications ranging from aiding individuals with disabilities to verifying witness accounts in court. The primary hurdles in this field are the absence of data collection protocols for visual imagery and the lack of datasets on the subject. Traditionally, fMRI-to-image relies on data collected from subjects exposed to visual stimuli, which poses issues for generating visual imagery based on the difference of brain activity between visual stimulation and visual imagery. For the first time, we have compiled a substantial dataset (around 6h of scans) on visual imagery along with a proposed data collection protocol. We then train a modified version of an fMRI-to-image model and demonstrate the feasibility of reconstructing images from two modes of imagination: from memory and from pure imagination. The resulting pipeline we call Mind-to-Image marks a step towards creating a technology that allow direct reconstruction of visual imagery.
翻译:过去十年间,得益于大规模功能磁共振成像数据集的可用性以及图像生成生成模型的进步,从视觉刺激期间采集的功能磁共振成像数据重建受试者观察到的图像已取得显著进展。然而,视觉重建的应用范围仍然有限。重建视觉想象则面临更大挑战,其潜在革命性应用涵盖从辅助残障人士到法庭证人证词验证等多个领域。该领域的主要障碍在于缺乏视觉想象的数据采集协议及相关主题数据集。传统上,功能磁共振成像到图像的转换依赖于受试者接受视觉刺激时采集的数据,这导致基于视觉刺激与视觉想象之间大脑活动差异生成视觉图像存在固有难题。我们首次构建了关于视觉想象的实质性数据集(约6小时扫描数据)并提出配套数据采集协议。随后,我们训练了功能磁共振成像到图像模型的改进版本,并论证了从两种想象模式(记忆想象与纯粹想象)重建图像的可行性。我们将此流程命名为"思维成像",标志着向实现直接重建视觉想象技术迈出了关键一步。