Within the realm of image recognition, a specific category of multi-label classification (MLC) challenges arises when objects within the visual field may occlude one another, demanding simultaneous identification of both occluded and occluding objects. Traditional convolutional neural networks (CNNs) can tackle these challenges; however, those models tend to be bulky and can only attain modest levels of accuracy. Leveraging insights from cutting-edge neural science research, specifically the Holistic Bursting (HB) cell, this paper introduces a pioneering integrated network framework named HB-net. Built upon the foundation of HB cell clusters, HB-net is designed to address the intricate task of simultaneously recognizing multiple occluded objects within images. Various Bursting cell cluster structures are introduced, complemented by an evidence accumulation mechanism. Testing is conducted on multiple datasets comprising digits and letters. The results demonstrate that models incorporating the HB framework exhibit a significant $2.98\%$ enhancement in recognition accuracy compared to models without the HB framework ($1.0298$ times, $p=0.0499$). Although in high-noise settings, standard CNNs exhibit slightly greater robustness when compared to HB-net models, the models that combine the HB framework and EA mechanism achieve a comparable level of accuracy and resilience to ResNet50, despite having only three convolutional layers and approximately $1/30$ of the parameters. The findings of this study offer valuable insights for improving computer vision algorithms. The essential code is provided at https://github.com/d-lab438/hb-net.git.
翻译:在图像识别领域,当视野中的物体可能相互遮挡时,会产生一类特定的多标签分类(MLC)挑战,要求同时识别被遮挡物体和遮挡物体。传统卷积神经网络(CNN)可应对这些挑战,但此类模型通常体积庞大且准确率仅能达到中等水平。借助前沿神经科学研究的洞察,特别是整体爆发(HB)细胞,本文提出了一种名为HB-net的开创性集成网络框架。基于HB细胞簇构建,HB-net专为解决图像中多个遮挡物体同时识别的复杂任务而设计。本文引入了多种爆发细胞簇结构,并辅以证据积累机制。在包含数字和字母的多个数据集上进行了测试。结果表明,与未采用HB框架的模型相比,采用HB框架的模型在识别准确率上显著提升了$2.98\%$($1.0298$倍,$p=0.0499$)。尽管在高噪声环境下,标准CNN相比HB-net模型表现出略高的鲁棒性,但结合HB框架和EA机制的模型在仅使用三层卷积层且参数量约为其$1/30$的情况下,仍能达到与ResNet50相当的准确率和韧性。本研究的发现为改进计算机视觉算法提供了宝贵见解。核心代码见https://github.com/d-lab438/hb-net.git。