Visual reconstruction algorithms are an interpretive tool that map brain activity to pixels. Past reconstruction algorithms employed brute-force search through a massive library to select candidate images that, when passed through an encoding model, accurately predict brain activity. Here, we use conditional generative diffusion models to extend and improve this search-based strategy. We decode a semantic descriptor from human brain activity (7T fMRI) in voxels across most of visual cortex, then use a diffusion model to sample a small library of images conditioned on this descriptor. We pass each sample through an encoding model, select the images that best predict brain activity, and then use these images to seed another library. We show that this process converges on high-quality reconstructions by refining low-level image details while preserving semantic content across iterations. Interestingly, the time-to-convergence differs systematically across visual cortex, suggesting a succinct new way to measure the diversity of representations across visual brain areas.
翻译:视觉重建算法是一种将脑活动映射到像素的解释性工具。过去的重建算法通过大规模库的暴力搜索,选择在通过编码模型时能准确预测脑活动的候选图像。在此,我们利用条件生成扩散模型扩展并改进了这种基于搜索的策略。我们从人类脑活动(7T fMRI)中解码视觉皮层大部分体素的语义描述符,然后使用扩散模型基于此描述符采样一个小型图像库。我们通过编码模型传递每个样本,选择最能预测脑活动的图像,并用这些图像作为种子生成另一个库。我们证明,该过程通过迭代优化低级图像细节同时保留语义内容,逐渐收敛到高质量重建。有趣的是,收敛时间在视觉皮层不同区域存在系统性差异,这为测量视觉脑区表征多样性提供了一种简洁的新方法。