SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers

Traditional deep learning-based object detection networks often resize images during the data preprocessing stage to achieve a uniform size and scale in the feature map. Resizing is done to facilitate model propagation and fully connected classification. However, resizing inevitably leads to object deformation and loss of valuable information in the images. This drawback becomes particularly pronounced for tiny objects like distribution towers with linear shapes and few pixels. To address this issue, we propose abandoning the resizing operation. Instead, we introduce Positional-Encoding Multi-head Criss-Cross Attention. This allows the model to capture contextual information and learn from multiple representation subspaces, effectively enriching the semantics of distribution towers. Additionally, we enhance Spatial Pyramid Pooling by reshaping three pooled feature maps into a new unified one while also reducing the computational burden. This approach allows images of different sizes and scales to generate feature maps with uniform dimensions and can be employed in feature map propagation. Our SCAResNet incorporates these aforementioned improvements into the backbone network ResNet. We evaluated our SCAResNet using the Electric Transmission and Distribution Infrastructure Imagery dataset from Duke University. Without any additional tricks, we employed various object detection models with Gaussian Receptive Field based Label Assignment as the baseline. When incorporating the SCAResNet into the baseline model, we achieved a 2.1% improvement in mAPs. This demonstrates the advantages of our SCAResNet in detecting transmission and distribution towers and its value in tiny object detection. The source code is available at https://github.com/LisavilaLee/SCAResNet_mmdet.

翻译：传统基于深度学习的物体检测网络在数据预处理阶段通常会对图像进行缩放，以获取统一尺寸和尺度的特征图，从而便于模型传播和全连接分类。然而，缩放操作不可避免地会导致物体形变和图像中重要信息的丢失。对于像配电塔这类线性形状且像素较少的微小目标，该缺陷尤为显著。为解决此问题，我们提出放弃缩放操作，转而引入位置编码多头十字交叉注意力机制，使模型能够捕获上下文信息并从多个表征子空间进行学习，有效丰富配电塔的语义信息。此外，我们通过重塑三个池化特征图为新的统一特征图并降低计算负担的方式，改进了空间金字塔池化。这种方法可使不同尺寸和尺度的图像生成维度统一的特征图，并可应用于特征图传播。我们的SCAResNet将上述改进集成到骨干网络ResNet中。我们利用杜克大学的电力输配电基础设施图像数据集对SCAResNet进行评估。在不使用任何额外技巧的前提下，我们以基于高斯感受野标签分配的各种物体检测模型作为基线。将SCAResNet融入基线模型后，mAPs提升了2.1%。这证明了SCAResNet在检测输电塔与配电塔方面的优势及其在微小目标检测中的价值。源代码已开源至https://github.com/LisavilaLee/SCAResNet_mmdet。