Two recent developments have accelerated progress in image reconstruction from human brain activity: large datasets that offer samples of brain activity in response to many thousands of natural scenes, and the open-sourcing of powerful stochastic image-generators that accept both low- and high-level guidance. Most work in this space has focused on obtaining point estimates of the target image, with the ultimate goal of approximating literal pixel-wise reconstructions of target images from the brain activity patterns they evoke. This emphasis belies the fact that there is always a family of images that are equally compatible with any evoked brain activity pattern, and the fact that many image-generators are inherently stochastic and do not by themselves offer a method for selecting the single best reconstruction from among the samples they generate. We introduce a novel reconstruction procedure (Second Sight) that iteratively refines an image distribution to explicitly maximize the alignment between the predictions of a voxel-wise encoding model and the brain activity patterns evoked by any target image. We show that our process converges on a distribution of high-quality reconstructions by refining both semantic content and low-level image details across iterations. Images sampled from these converged image distributions are competitive with state-of-the-art reconstruction algorithms. Interestingly, the time-to-convergence varies systematically across visual cortex, with earlier visual areas generally taking longer and converging on narrower image distributions, relative to higher-level brain areas. Second Sight thus offers a succinct and novel method for exploring the diversity of representations across visual brain areas.
翻译:两项最新进展加速了从人类大脑活动进行图像重建的研究:一是大型数据集的构建,提供了对数千张自然场景响应的大脑活动样本;二是开源了可接受低层和高层引导的强大随机图像生成器。该领域的大多数研究聚焦于获取目标图像的点估计,最终目标是近似实现从诱发大脑活动模式到目标图像的逐像素字面重建。这种重心掩盖了一个事实:任何诱发的大脑活动模式总是对应一系列同样兼容的图像,同时许多图像生成器本质上是随机的,本身并不提供从生成样本中选出单一最佳重建的方法。我们提出了一种新型重建流程(第二视觉),该流程通过迭代优化图像分布,明确最大化体素级编码模型预测与任何目标图像所诱发大脑活动模式之间的对齐。结果显示,我们的流程通过跨迭代精炼语义内容和低层图像细节,收敛于高质量重建的分布。从这些收敛图像分布中采样的图像可与最先进的重建算法相媲美。有趣的是,收敛时间在大脑视觉皮层中呈现系统性差异:相较于高级脑区,早期视觉区域通常需要更长时间,并收敛于更窄的图像分布。因此,第二视觉为探索视觉脑区表征的多样性提供了一种简洁且新颖的方法。