Despite advancements in artificial intelligence, object recognition models still lag behind in emulating visual information processing in human brains. Recent studies have highlighted the potential of using neural data to mimic brain processing; however, these often rely on invasive neural recordings from non-human subjects, leaving a critical gap in understanding human visual perception. Addressing this gap, we present, for the first time, 'Re(presentational)Al(ignment)net', a vision model aligned with human brain activity based on non-invasive EEG, demonstrating a significantly higher similarity to human brain representations. Our innovative image-to-brain multi-layer encoding framework advances human neural alignment by optimizing multiple model layers and enabling the model to efficiently learn and mimic human brain's visual representational patterns across object categories and different modalities. Our findings suggest that ReAlnet represents a breakthrough in bridging the gap between artificial and human vision, and paving the way for more brain-like artificial intelligence systems.
翻译:尽管人工智能取得了进展,但在模拟人脑的视觉信息处理方面,物体识别模型仍存在不足。近期研究凸显了利用神经数据模拟大脑处理的潜力,然而这些研究通常依赖非人类受试者的侵入性神经记录,导致对人类视觉感知的理解存在关键空白。为填补这一空白,我们首次提出"Re(presentational)Al(ignment)net"——一种基于非侵入性脑电信号与人类脑活动对齐的视觉模型,表明其与人类大脑表征具有显著更高的相似性。我们创新的图像-大脑多层编码框架通过优化多个模型层级,并促使模型高效学习和模仿人类大脑在物体类别及不同模态间的视觉表征模式,从而推进了人类神经对齐。我们的研究结果表明,ReAlnet在弥合人工视觉与人类视觉之间的鸿沟方面取得了突破,并为更类脑的人工智能系统铺平了道路。