Bi-temporal satellite imagery supports critical applications such as urbanization monitoring and disaster assessment. Although powerful multimodal large language models~(MLLMs) have been applied in bi-temporal change analysis, previous methods process image pairs through direct concatenation, inadequately modeling temporal correlations and spatial semantic changes. This deficiency hampers visual-semantic alignment in change understanding, thereby constraining the overall effectiveness of current approaches. To address this gap, we propose BTCChat, a multi-temporal MLLM with advanced bi-temporal change understanding capability. BTCChat supports bi-temporal change captioning and retains single-image interpretation capability. To better capture temporal features and spatial semantic changes in image pairs, we design a Change Extraction module. Moreover, to enhance the model's attention to spatial details, we introduce a Prompt Augmentation mechanism, which incorporates contextual clues into the prompt to enhance model performance. Experimental results demonstrate that BTCChat achieves state-of-the-art performance on change captioning and visual question answering tasks. The code is available \href{https://github.com/IntelliSensing/BTCChat}{here}.
翻译:双时相卫星影像支持城市化监测与灾害评估等关键应用。尽管强大的多模态大语言模型(MLLMs)已被应用于双时相变化分析,现有方法通常通过直接拼接处理图像对,未能充分建模时间关联与空间语义变化。这一缺陷阻碍了变化理解中的视觉-语义对齐,从而限制了现有方法的整体效能。为弥补这一不足,我们提出BTCChat——一种具备先进双时相变化理解能力的多时相MLLM。BTCChat支持双时相变化描述任务,同时保留单幅影像解译能力。为更好地捕捉图像对中的时序特征与空间语义变化,我们设计了变化提取模块。此外,为增强模型对空间细节的关注,我们引入提示增强机制,将上下文线索融入提示中以提升模型性能。实验结果表明,BTCChat在变化描述与视觉问答任务上均取得了最先进的性能。代码发布于\href{https://github.com/IntelliSensing/BTCChat}{此处}。