Agricultural parcel extraction plays an important role in remote sensing-based agricultural monitoring, supporting parcel surveying, precision management, and ecological assessment. However, existing public benchmarks mainly focus on regular and relatively flat farmland scenes. In contrast, terraced parcels in mountainous regions exhibit stepped terrain, pronounced elevation variation, irregular boundaries, and strong cross-regional heterogeneity, making parcel extraction a more challenging problem that jointly requires visual recognition, semantic discrimination, and terrain-aware geometric understanding. Although recent studies have advanced visual parcel benchmarks and image-text farmland understanding, a unified benchmark for complex terraced parcel extraction under aligned image-text-DEM settings remains absent. To fill this gap, we present GTPBD-MM, the first multimodal benchmark for global terraced parcel extraction. Built upon GTPBD, GTPBD-MM integrates high-resolution optical imagery, structured text descriptions, and DEM data, and supports systematic evaluation under Image-only, Image+Text, and Image+Text+DEM settings. We further propose Elevation and Text guided Terraced parcel network (ETTerra), a multimodal baseline for terraced parcel delineation. Extensive experiments demonstrate that textual semantics and terrain geometry provide complementary cues beyond visual appearance alone, yielding more accurate, coherent, and structurally consistent delineation results in complex terraced scenes.
翻译:摘要:农业地块提取在基于遥感的农业监测中扮演着重要角色,支撑着地块调查、精准管理与生态评估。然而,现有的公开基准数据集主要聚焦于规则且相对平坦的农田场景。相比之下,山区梯田地块呈现阶梯状地形、显著的高程变化、不规则边界及强烈的跨区域异质性,使得地块提取成为一项需同时融合视觉识别、语义区分和地形感知几何理解的更具挑战性的问题。尽管近期研究推动了视觉地块基准数据集与图文农地理解的发展,但在图像-文本-DEM对齐设定下针对复杂梯田地块提取的统一基准数据集仍属空白。为填补这一空白,我们提出GTPBD-MM——首个面向全球梯田地块提取的多模态基准数据集。该数据集基于GTPBD构建,融合了高分辨率光学影像、结构化文本描述与DEM数据,支持在仅图像、图像+文本及图像+文本+DEM三种设定下的系统性评估。我们进一步提出高程与文本引导的梯田地块网络(ETTerra),一种用于梯田地块勾画的多模态基线方法。大量实验表明,文本语义与地形几何信息可提供超越视觉外观的互补线索,在复杂梯田场景中生成更精确、连贯且结构一致的地块勾画结果。