SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets

from arxiv, Extended version of the accepted research track paper at the 31st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2023), Hamburg, Germany. 11 pages, 8 figures, 6 tables

We propose a Self-supervised Anomaly Detection technique, called SeMAnD, to detect geometric anomalies in Multimodal geospatial datasets. Geospatial data comprises of acquired and derived heterogeneous data modalities that we transform to semantically meaningful, image-like tensors to address the challenges of representation, alignment, and fusion of multimodal data. SeMAnD is comprised of (i) a simple data augmentation strategy, called RandPolyAugment, capable of generating diverse augmentations of vector geometries, and (ii) a self-supervised training objective with three components that incentivize learning representations of multimodal data that are discriminative to local changes in one modality which are not corroborated by the other modalities. Detecting local defects is crucial for geospatial anomaly detection where even small anomalies (e.g., shifted, incorrectly connected, malformed, or missing polygonal vector geometries like roads, buildings, landcover, etc.) are detrimental to the experience and safety of users of geospatial applications like mapping, routing, search, and recommendation systems. Our empirical study on test sets of different types of real-world geometric geospatial anomalies across 3 diverse geographical regions demonstrates that SeMAnD is able to detect real-world defects and outperforms domain-agnostic anomaly detection strategies by 4.8-19.7% as measured using anomaly classification AUC. We also show that model performance increases (i) up to 20.4% as the number of input modalities increase and (ii) up to 22.9% as the diversity and strength of training data augmentations increase.

翻译：我们提出一种名为SeMAnD的自监督异常检测技术，用于检测多模态地理空间数据集中的几何异常。地理空间数据由获取和衍生的异质数据模态组成，我们将其转换为语义上有意义的、类似图像的张量，以解决多模态数据的表示、对齐和融合挑战。SeMAnD包括：（i）一种简单的数据增强策略，称为RandPolyAugment，能够生成向量几何的多样化增强；（ii）一个包含三个组件的自监督训练目标，该目标激励学习多模态数据的表征，这些表征对一种模态中的局部变化具有判别性，而其他模态则无法证实这些变化。检测局部缺陷对于地理空间异常检测至关重要，因为即使微小的异常（例如，移位、错误连接、畸形或缺失的多边形向量几何，如道路、建筑物、土地覆盖等）也会对地图绘制、路线规划、搜索和推荐系统等地理空间应用的用户体验和安全性造成损害。我们在三个不同地理区域的不同类型真实世界几何地理空间异常测试集上的实证研究表明，SeMAnD能够检测真实世界的缺陷，并且在使用异常分类AUC衡量时，比领域无关的异常检测策略性能高出4.8-19.7%。我们还表明，模型性能（i）随着输入模态数量的增加而提升高达20.4%，（ii）随着训练数据增强的多样性和强度的增加而提升高达22.9%。