Change detection plays a fundamental role in Earth observation for analyzing temporal iterations over time. However, recent studies have largely neglected the utilization of multimodal data that presents significant practical and technical advantages compared to single-modal approaches. This research focuses on leveraging digital surface model (DSM) data and aerial images captured at different times for detecting change beyond 2D. We observe that the current change detection methods struggle with the multitask conflicts between semantic and height change detection tasks. To address this challenge, we propose an efficient Transformer-based network that learns shared representation between cross-dimensional inputs through cross-attention. It adopts a consistency constraint to establish the multimodal relationship, which involves obtaining pseudo change through height change thresholding and minimizing the difference between semantic and pseudo change within their overlapping regions. A DSM-to-image multimodal dataset encompassing three cities in the Netherlands was constructed. It lays a new foundation for beyond-2D change detection from cross-dimensional inputs. Compared to five state-of-the-art change detection methods, our model demonstrates consistent multitask superiority in terms of semantic and height change detection. Furthermore, the consistency strategy can be seamlessly adapted to the other methods, yielding promising improvements.
翻译:变化检测在地球观测中扮演着基础性角色,用于分析随时间推移的时间迭代过程。然而,近期研究大多忽视了多模态数据的利用,而相比单模态方法,多模态数据具有显著的实际和技术优势。本研究聚焦于利用不同时期采集的数字表面模型(DSM)数据和航空影像,以检测超越二维维度的变化。我们观察到,当前的变化检测方法在语义变化检测和高度变化检测任务之间面临多任务冲突的挑战。为解决这一问题,我们提出了一种高效的基于Transformer的网络,通过交叉注意力机制学习跨维度输入之间的共享表示。该网络采用一致性约束建立多模态关系,具体通过高度变化阈值化获取伪变化,并最小化语义变化与伪变化在重叠区域内的差异。我们构建了一个包含荷兰三个城市的DSM-影像多模态数据集,为基于跨维度输入的超越二维变化检测奠定了新基础。与五种最先进的变化检测方法相比,我们的模型在语义变化检测和高度变化检测方面展现出持续的多任务优越性。此外,该一致性策略可无缝适应其他方法,并带来显著的性能提升。