Automatic tree density estimation and counting using single aerial and satellite images is a challenging task in photogrammetry and remote sensing, yet has an important role in forest management. In this paper, we propose the first semisupervised transformer-based framework for tree counting which reduces the expensive tree annotations for remote sensing images. Our method, termed as TreeFormer, first develops a pyramid tree representation module based on transformer blocks to extract multi-scale features during the encoding stage. Contextual attention-based feature fusion and tree density regressor modules are further designed to utilize the robust features from the encoder to estimate tree density maps in the decoder. Moreover, we propose a pyramid learning strategy that includes local tree density consistency and local tree count ranking losses to utilize unlabeled images into the training process. Finally, the tree counter token is introduced to regulate the network by computing the global tree counts for both labeled and unlabeled images. Our model was evaluated on two benchmark tree counting datasets, Jiangsu, and Yosemite, as well as a new dataset, KCL-London, created by ourselves. Our TreeFormer outperforms the state of the art semi-supervised methods under the same setting and exceeds the fully-supervised methods using the same number of labeled images. The codes and datasets are available at https://github.com/HAAClassic/TreeFormer.
翻译:利用单张航空和卫星图像进行自动树木密度估计与计数是摄影测量与遥感领域中的一项挑战性任务,但在森林管理中具有重要作用。本文提出首个基于半监督Transformer的树木计数框架,可减少遥感图像中昂贵的树木标注成本。我们的方法称为TreeFormer,首先开发了基于Transformer模块的金字塔树木表征模块,在编码阶段提取多尺度特征。进一步设计了基于上下文注意力的特征融合模块和树木密度回归模块,以利用编码器的鲁棒特征在解码器中估计树木密度图。此外,我们提出一种金字塔学习策略,包含局部树木密度一致性和局部树木计数排序损失,将未标注图像纳入训练过程。最后引入树木计数令牌,通过计算标注和未标注图像的全局树木计数来调控网络。该模型在两个基准树木计数数据集(江苏、约塞米蒂)以及我们新创建的KCL-London数据集上进行了评估。在相同设置下,TreeFormer的性能优于最先进的半监督方法,并超越了使用相同数量标注图像的完全监督方法。代码与数据集见https://github.com/HAAClassic/TreeFormer。