Forest-Chat: Adapting Vision-Language Agents for Interactive Forest Change Analysis

The increasing availability of high-resolution satellite imagery, together with advances in deep learning, creates new opportunities for forest monitoring workflows. Two central challenges in this domain are pixel-level change detection and semantic change interpretation, particularly for complex forest dynamics. While large language models (LLMs) are increasingly adopted for data exploration, their integration with vision-language models (VLMs) for remote sensing image change interpretation (RSICI) remains underexplored, especially beyond urban environments. This paper introduces Forest-Chat, an LLM-driven agent for forest change analysis, enabling natural language querying across multiple RSICI tasks, including change detection and captioning, object counting, deforestation characterisation, and change reasoning. Forest-Chat builds upon a multi-level change interpretation (MCI) vision-language backbone with LLM-based orchestration, incorporating zero-shot change detection via AnyChange and multimodal LLM-based zero-shot change captioning and refinement. To support adaptation and evaluation in forest environments, we introduce the Forest-Change dataset, comprising bi-temporal satellite imagery, pixel-level change masks, and semantic change captions via human annotation and rule-based methods. Forest-Chat achieves mIoU and BLEU-4 scores of 67.10% and 40.17% on Forest-Change, and 88.13% and 34.41% on LEVIR-MCI-Trees, a tree-focused subset of LEVIR-MCI. In a zero-shot capacity, it achieves 60.15% and 34.00% on Forest-Change, and 47.32% and 18.23% on LEVIR-MCI-Trees. Further experiments demonstrate the value of caption refinement for injecting geographic domain knowledge into supervised captions, and the system's limited label domain transfer onto JL1-CD-Trees. These findings demonstrate that interactive, LLM-driven systems can support accessible and interpretable forest change analysis.

翻译：高分辨率卫星影像的日益普及，以及深度学习的进展，为森林监测工作流程带来了新的机遇。该领域的两个核心挑战是像素级变化检测与语义变化解释，尤其是在复杂的森林动态场景中。尽管大型语言模型（LLM）日益广泛用于数据探索，但其与视觉-语言模型（VLM）在遥感图像变化解释（RSICI）中的集成仍未得到充分探索，尤其是在城市环境之外的场景。本文提出Forest-Chat，一种由LLM驱动的森林变化分析智能体，支持跨多类RSICI任务的自然语言查询，包括变化检测与描述、目标计数、毁林特征描述以及变化推理。Forest-Chat基于多层级变化解释（MCI）视觉-语言骨干网络，结合基于LLM的编排机制，通过AnyChange实现零样本变化检测，并利用多模态LLM实现零样本变化描述与精炼。为支持森林环境下的适配与评估，我们引入Forest-Change数据集，包含双时相卫星影像、像素级变化掩膜，以及通过人工标注与规则方法生成的语义变化描述文本。Forest-Chat在Forest-Change数据集上达到67.10%的平均交并比（mIoU）和40.17%的BLEU-4分数；在LEVIR-MCI-Trees（LEVIR-MCI中专注于树木的子集）上达到88.13%的mIoU和34.41%的BLEU-4分数。在零样本能力方面，其分别在Forest-Change上取得60.15%的mIoU和34.00%的BLEU-4分数，在LEVIR-MCI-Trees上取得47.32%的mIoU和18.23%的BLEU-4分数。进一步实验表明，描述精炼对于向监督式描述文本注入地理领域知识具有价值，同时该系统在JL1-CD-Trees上呈现有限的标签域迁移能力。这些发现表明，交互式、LLM驱动的系统能够支持可访问且可解释的森林变化分析。