Generating learning-friendly representations for points in space is a fundamental and long-standing problem in ML. Recently, multi-scale encoding schemes (such as Space2Vec and NeRF) were proposed to directly encode any point in 2D/3D Euclidean space as a high-dimensional vector, and has been successfully applied to various geospatial prediction and generative tasks. However, all current 2D and 3D location encoders are designed to model point distances in Euclidean space. So when applied to large-scale real-world GPS coordinate datasets, which require distance metric learning on the spherical surface, both types of models can fail due to the map projection distortion problem (2D) and the spherical-to-Euclidean distance approximation error (3D). To solve these problems, we propose a multi-scale location encoder called Sphere2Vec which can preserve spherical distances when encoding point coordinates on a spherical surface. We developed a unified view of distance-reserving encoding on spheres based on the DFS. We also provide theoretical proof that the Sphere2Vec preserves the spherical surface distance between any two points, while existing encoding schemes do not. Experiments on 20 synthetic datasets show that Sphere2Vec can outperform all baseline models on all these datasets with up to 30.8% error rate reduction. We then apply Sphere2Vec to three geo-aware image classification tasks - fine-grained species recognition, Flickr image recognition, and remote sensing image classification. Results on 7 real-world datasets show the superiority of Sphere2Vec over multiple location encoders on all three tasks. Further analysis shows that Sphere2Vec outperforms other location encoder models, especially in the polar regions and data-sparse areas because of its nature for spherical surface distance preservation. Code and data are available at https://gengchenmai.github.io/sphere2vec-website/.
翻译:摘要:为空间中的点生成利于学习的高效表征是机器学习领域一个基础且长期存在的问题。近期,多尺度编码方案(如Space2Vec和NeRF)被提出用于直接对二维/三维欧氏空间中的任意点进行高维向量编码,并已成功应用于多种地理空间预测与生成任务。然而,现有所有二维与三维位置编码器均基于欧氏空间建模点间距离。当应用于大规模真实GPS坐标数据集(需在球面上进行距离度量学习)时,两类模型均可能因地图投影畸变问题(二维模型)或球面-欧氏距离近似误差(三维模型)而失效。为解决上述问题,我们提出名为Sphere2Vec的多尺度位置编码器,该编码器能在对球面点坐标进行编码时保留球面距离。我们基于DFS构建了球面上保距编码的统一视角,并理论证明了Sphere2Vec可保持任意两点间的球面距离,而现有编码方案不具备此特性。在20个合成数据集上的实验表明,Sphere2Vec在所有数据集上均优于全部基线模型,错误率最高降低30.8%。我们将Sphere2Vec应用于三项地理感知图像分类任务——细粒度物种识别、Flickr图像识别与遥感图像分类。在7个真实数据集上的结果证实,Sphere2Vec在这三项任务中均优于多种位置编码器。进一步分析表明,由于具有球面距离保持特性,Sphere2Vec在极地区域与数据稀疏区域的表现尤为突出。代码与数据详见https://gengchenmai.github.io/sphere2vec-website/。