Accurate weed mapping in cereal fields requires pixel-level segmentation from UAV imagery that remains reliable across fields, seasons, and illumination. Existing multispectral pipelines often depend on thresholded vegetation indices, which are brittle under radiometric drift and mixed crop--weed pixels, or on single-stream CNN and Transformer backbones that ingest stacked bands and indices, where radiance cues and normalized index cues interfere and reduce sensitivity to small weed clusters embedded in crop canopy. We propose VISA, a two-stream segmentation network that decouples these cues and fuses them at native resolution. The radiance stream learns from calibrated five-band reflectance using local residual convolutions, channel recalibration, spatial gating, and skip-connected decoding, which preserve fine textures, row boundaries, and small weed structures that are often weakened after ratio-based index compression. The index stream operates on vegetation-index maps with windowed self-attention to model local structure efficiently, state-space layers to propagate field-scale context without quadratic attention cost, and Slot Attention to form stable region descriptors that improve discrimination of sparse weeds under canopy mixing. To support supervised training and deployment-oriented evaluation, we introduce BAWSeg, a four-year UAV multispectral dataset collected over commercial barley paddocks in Western Australia, providing radiometrically calibrated blue, green, red, red edge, and near-infrared orthomosaics, derived vegetation indices, and dense crop, weed, and other labels with leakage-free block splits. On BAWSeg, VISA achieves 75.6% mIoU and 63.5% weed IoU with 22.8 M parameters, outperforming a multispectral SegFormer-B1 baseline by 1.2 mIoU and 1.9 weed IoU. Under cross-plot and cross-year protocols, VISA maintains 71.2% and 69.2% mIoU, respectively.
翻译:在谷物田中进行精确的杂草制图,需要基于无人机图像进行像素级分割,且该分割方法在不同田块、季节和光照条件下均保持可靠。现有的多光谱处理流程通常依赖于基于阈值的植被指数,这些指数在辐射测量漂移和作物-杂草像素混合的情况下表现脆弱;或者依赖于单流CNN和Transformer骨干网络,这些网络输入堆叠的波段和指数,导致辐射线索和归一化指数线索相互干扰,降低了对嵌入作物冠层的小型杂草簇的敏感性。我们提出了VISA,一种双流分割网络,它解耦了这些线索并在原始分辨率下进行融合。辐射流通过局部残差卷积、通道重校准、空间门控和跳跃连接解码,从经过校准的五波段反射率中学习,从而保留了精细纹理、行边界和小型杂草结构,这些特征在基于比值的指数压缩后通常会被削弱。指数流则在植被指数图上运行,采用窗口化自注意力来高效建模局部结构,使用状态空间层来传播田间尺度上下文而无需二次注意力成本,并利用Slot Attention形成稳定的区域描述符,以改善冠层混合下稀疏杂草的区分能力。为了支持监督训练和面向部署的评估,我们引入了BAWSeg,这是一个在澳大利亚西部商业大麦田上收集的为期四年的无人机多光谱数据集,提供了经过辐射校准的蓝、绿、红、红边和近红外正射影像、衍生的植被指数,以及密集的作物、杂草和其他类别标签,并采用无泄漏的区块划分。在BAWSeg数据集上,VISA以22.8 M的参数实现了75.6%的mIoU和63.5%的杂草IoU,分别比多光谱SegFormer-B1基线高出1.2个mIoU和1.9个杂草IoU。在跨田块和跨年份的评估协议下,VISA分别保持了71.2%和69.2%的mIoU。