Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection

Advanced cognition can be extracted from the human brain using brain-computer interfaces. Integrating these interfaces with computer vision techniques, which possess efficient feature extraction capabilities, can achieve more robust and accurate detection of dim targets in aerial images. However, existing target detection methods primarily concentrate on homogeneous data, lacking efficient and versatile processing capabilities for heterogeneous multimodal data. In this paper, we first build a brain-eye-computer based object detection system for aerial images under few-shot conditions. This system detects suspicious targets using region proposal networks, evokes the event-related potential (ERP) signal in electroencephalogram (EEG) through the eye-tracking-based slow serial visual presentation (ESSVP) paradigm, and constructs the EEG-image data pairs with eye movement data. Then, an adaptive modality balanced online knowledge distillation (AMBOKD) method is proposed to recognize dim objects with the EEG-image data. AMBOKD fuses EEG and image features using a multi-head attention module, establishing a new modality with comprehensive features. To enhance the performance and robust capability of the fusion modality, simultaneous training and mutual learning between modalities are enabled by end-to-end online knowledge distillation. During the learning process, an adaptive modality balancing module is proposed to ensure multimodal equilibrium by dynamically adjusting the weights of the importance and the training gradients across various modalities. The effectiveness and superiority of our method are demonstrated by comparing it with existing state-of-the-art methods. Additionally, experiments conducted on public datasets and system validations in real-world scenarios demonstrate the reliability and practicality of the proposed system and the designed method.

翻译：利用脑机接口可从人脑中提取高级认知信息。将此类接口与具备高效特征提取能力的计算机视觉技术相结合，可在航拍图像中实现更鲁棒、更精确的暗弱目标检测。然而，现有目标检测方法主要集中于同构数据，对异构多模态数据缺乏高效通用的处理能力。本文首先构建了少样本条件下基于脑眼计算机的航拍图像目标检测系统。该系统通过区域提议网络检测可疑目标，采用基于眼动追踪的慢速序列视觉呈现范式诱发脑电图中的事件相关电位信号，并结合眼动数据构建脑电-图像数据对。随后，提出一种自适应模态平衡在线知识蒸馏方法，利用脑电-图像数据识别暗弱目标。该方法通过多头注意力模块融合脑电与图像特征，构建具有综合特征的新模态。为提升融合模态的性能与鲁棒性，通过端到端在线知识蒸馏实现模态间的同步训练与相互学习。在学习过程中，提出自适应模态平衡模块，通过动态调整各模态的重要性权重与训练梯度来保证多模态均衡性。通过与现有先进方法的对比实验，验证了所提方法的有效性与优越性。此外，在公开数据集上的实验及实际场景中的系统验证，证明了所提系统与设计方法的可靠性与实用性。