The rapid expansion of multi-source satellite imagery drives innovation in Earth observation, opening unprecedented opportunities for Remote Sensing Foundation Models to harness diverse data. However, many existing models remain constrained by fixed spatial resolutions and patch sizes, limiting their ability to fully exploit the heterogeneous spatial characteristics inherent in satellite imagery. To address these challenges, we propose FlexiMo, a flexible remote sensing foundation model that endows the pre-trained model with the flexibility to adapt to arbitrary spatial resolutions. Central to FlexiMo is a spatial resolution-aware module that employs a parameter-free alignment embedding mechanism to dynamically recalibrate patch embeddings based on the input image's resolution and dimensions. This design not only preserves critical token characteristics and ensures multi-scale feature fidelity but also enables efficient feature extraction without requiring modifications to the underlying network architecture. In addition, FlexiMo incorporates a lightweight channel adaptation module that leverages prior spectral information from sensors. This mechanism allows the model to process images with varying numbers of channels while maintaining the data's intrinsic physical properties. Extensive experiments on diverse multimodal, multi-resolution, and multi-scale datasets demonstrate that FlexiMo significantly enhances model generalization and robustness. In particular, our method achieves outstanding performance across a range of downstream tasks, including scene classification, land cover classification, urban building segmentation, and cloud detection. By enabling parameter-efficient and physically consistent adaptation, FlexiMo paves the way for more adaptable and effective foundation models in real-world remote sensing applications.
翻译:多源卫星影像的快速扩展推动了地球观测领域的创新,为遥感基础模型利用多样化数据带来了前所未有的机遇。然而,现有许多模型仍受限于固定的空间分辨率和图像块尺寸,难以充分利用卫星影像固有的异质性空间特征。为应对这些挑战,本文提出FlexiMo,一种灵活的遥感基础模型,使预训练模型能够适应任意空间分辨率。FlexiMo的核心是一个空间分辨率感知模块,该模块采用无参数对齐嵌入机制,根据输入图像的分辨率和尺寸动态重校准图像块嵌入。这一设计不仅保留了关键令牌特征并确保多尺度特征保真度,还能在不改变底层网络架构的情况下实现高效特征提取。此外,FlexiMo还集成了一个轻量级通道适配模块,该模块利用传感器先验光谱信息,使模型能够处理不同通道数量的图像,同时保持数据固有的物理特性。在多模态、多分辨率及多尺度数据集上的大量实验表明,FlexiMo显著提升了模型的泛化能力和鲁棒性。特别地,本方法在一系列下游任务中均取得优异性能,包括场景分类、土地覆盖分类、城市建筑物分割和云检测。通过实现参数高效且物理一致的适配,FlexiMo为实际遥感应用中更具适应性和有效性的基础模型开辟了新途径。