Visual reconstruction algorithms are an interpretive tool that map brain activity to pixels. Past reconstruction algorithms employed brute-force search through a massive library to select candidate images that, when passed through an encoding model, accurately predict brain activity. Here, we use conditional generative diffusion models to extend and improve this search-based strategy. We decode a semantic descriptor from human brain activity (7T fMRI) in voxels across most of visual cortex, then use a diffusion model to sample a small library of images conditioned on this descriptor. We pass each sample through an encoding model, select the images that best predict brain activity, and then use these images to seed another library. We show that this process converges on high-quality reconstructions by refining low-level image details while preserving semantic content across iterations. Interestingly, the time-to-convergence differs systematically across visual cortex, suggesting a succinct new way to measure the diversity of representations across visual brain areas.
翻译:视觉重建算法是一种将脑活动映射为像素的解释性工具。过去的重建算法通过大规模图库的暴力搜索,选取那些经编码模型处理后能准确预测脑活动的候选图像。在此,我们采用条件生成扩散模型对这一基于搜索的策略进行扩展与改进。我们从人类大脑活动(7T fMRI)中解码语义描述符(覆盖大部分视觉皮层的体素),随后利用扩散模型基于该描述符采样一个小型图像库。对每个采样图像通过编码模型后,选取最能预测脑活动的图像,再以这些图像为种子生成新的图像库。研究表明,该过程通过迭代优化保留语义内容的同时细化底层图像细节,最终收敛至高质量重建结果。值得注意的是,不同视觉皮层区域的收敛时间存在系统性差异,这为测量视觉脑区表征多样性提供了一种简洁的新方法。