In this study, we attempt to model intuition and incorporate this formalism to improve the performance of the Convolutional Neural Networks. Despite decades of research, ambiguities persist on principles of intuition. Experimental psychology reveals many types of intuition, which depend on state of the human mind. We focus on visual intuition, useful for completing missing information during visual cognitive tasks. First, we set up a scenario to gradually decrease the amount of visual information in the images of a dataset to examine its impact on CNN accuracy. Then, we represent a model for visual intuition using Gestalt theory. The theory claims that humans derive a set of templates according to their subconscious experiences. When the brain decides that there is missing information in a scene, such as occlusion, it instantaneously completes the information by replacing the missing parts with the most similar ones. Based upon Gestalt theory, we model the visual intuition, in two layers. Details of these layers are provided throughout the paper. We use the MNIST data set to test the suggested intuition model for completing the missing information. Experiments show that the augmented CNN architecture provides higher performances compared to the classic models when using incomplete images.
翻译:本研究尝试对直觉进行建模,并将该形式化框架融入以提升卷积神经网络的性能。尽管历经数十年研究,直觉的基本原理仍存在诸多模糊之处。实验心理学揭示了多种依赖人类心智状态的直觉类型。我们聚焦于视觉直觉——这种直觉在视觉认知任务中对于补全缺失信息具有重要作用。首先,我们构建实验场景,通过逐步减少数据集中图像的视觉信息量来探究其对CNN准确率的影响。随后,我们运用格式塔理论构建视觉直觉模型。该理论认为人类会根据潜意识经验推导出一组认知模板。当大脑判定场景中存在缺失信息时(例如遮挡),会通过使用最相似的模板即时补全缺失部分。基于格式塔理论,我们将视觉直觉建模为双层结构,其具体细节将在全文展开阐述。我们采用MNIST数据集测试所提出的直觉模型在信息补全任务中的表现。实验表明,当处理不完整图像时,融入直觉模型的增强型CNN架构相较于经典模型展现出更优的性能。