Attention models have recently emerged as a powerful approach, demonstrating significant progress in various fields. Visualization techniques, such as class activation mapping, provide visual insights into the reasoning of convolutional neural networks (CNNs). Using network gradients, it is possible to identify regions where the network pays attention during image recognition tasks. Furthermore, these gradients can be combined with CNN features to localize more generalizable, task-specific attentive (salient) regions within scenes. However, explicit use of this gradient-based attention information integrated directly into CNN representations for semantic object understanding remains limited. Such integration is particularly beneficial for visual tasks like simultaneous localization and mapping (SLAM), where CNN representations enriched with spatially attentive object locations can enhance performance. In this work, we propose utilizing task-specific network attention for RGB-D indoor SLAM. Specifically, we integrate layer-wise attention information derived from network gradients with CNN feature representations to improve frame association performance. Experimental results indicate improved performance compared to baseline methods, particularly for large environments.
翻译:注意力模型近年来已成为一种强大方法,在多个领域展现出显著进展。类激活映射等可视化技术为卷积神经网络(CNN)的推理过程提供了视觉洞察。利用网络梯度,可以识别网络在图像识别任务中关注的区域。此外,这些梯度可与CNN特征结合,以定位场景中更具泛化性、任务特定的注意力(显著)区域。然而,将这种基于梯度的注意力信息直接整合到CNN表征中以实现语义对象理解的显式应用仍较为有限。此类整合对于视觉任务(如同时定位与建图(SLAM))尤为有益,其中融入空间注意力对象位置的CNN表征可提升性能。本研究提出利用任务特定的网络注意力改进RGB-D室内SLAM。具体而言,我们将从网络梯度提取的层级注意力信息与CNN特征表征相结合,以提升帧关联性能。实验结果表明,相较于基线方法,该方法性能有所改善,在大型环境中表现尤为突出。