Accurate delineation of agricultural field boundaries is essential for effective crop monitoring and resource management. However, competing methodologies often face significant challenges, particularly in their reliance on extensive manual efforts for cloud-free data curation and limited adaptability to diverse global conditions. In this paper, we introduce PTAViT3D, a deep learning architecture specifically designed for processing three-dimensional time series of satellite imagery from either Sentinel-1 (S1) or Sentinel-2 (S2). Additionally, we present PTAViT3D-CA, an extension of the PTAViT3D model incorporating cross-attention mechanisms to fuse S1 and S2 datasets, enhancing robustness in cloud-contaminated scenarios. The proposed methods leverage spatio-temporal correlations through a memory-efficient 3D Vision Transformer architecture, facilitating accurate boundary delineation directly from raw, cloud-contaminated imagery. We comprehensively validate our models through extensive testing on various datasets, including Australia's ePaddocks - CSIRO's national agricultural field boundary product - alongside public benchmarks Fields-of-the-World, PASTIS, and AI4SmallFarms. Our results consistently demonstrate state-of-the-art performance, highlighting excellent global transferability and robustness. Crucially, our approach significantly simplifies data preparation workflows by reliably processing cloud-affected imagery, thereby offering strong adaptability across diverse agricultural environments. Our code and models are publicly available at https://github.com/feevos/tfcl.
翻译:农业田块边界的精确划分对于有效的作物监测与资源管理至关重要。然而,现有方法常面临重大挑战,尤其体现在对大量人工筛选无云数据的依赖,以及对全球多样化环境适应性的局限。本文提出PTAViT3D——一种专门为处理Sentinel-1(S1)或Sentinel-2(S2)卫星影像三维时间序列而设计的深度学习架构。此外,我们进一步提出PTAViT3D-CA模型,该模型通过引入交叉注意力机制融合S1与S2数据集,增强了在云污染场景下的鲁棒性。所提方法通过内存高效的3D Vision Transformer架构利用时空相关性,实现了直接基于原始含云影像的精确边界划分。我们在多个数据集上进行了全面验证,包括澳大利亚国家农业田块边界产品ePaddocks - CSIRO,以及公开基准数据集Fields-of-the-World、PASTIS和AI4SmallFarms。实验结果一致表明我们的方法达到了最先进的性能,展现出优异的全球可迁移性与鲁棒性。尤为重要的是,本方法通过可靠处理受云层影响的影像,显著简化了数据准备工作流程,从而为多样化农业环境提供了强大的适应性。我们的代码与模型已在https://github.com/feevos/tfcl公开。