Recently, the flourishing large language models(LLM), especially ChatGPT, have shown exceptional performance in language understanding, reasoning, and interaction, attracting users and researchers from multiple fields and domains. Although LLMs have shown great capacity to perform human-like task accomplishment in natural language and natural image, their potential in handling remote sensing interpretation tasks has not yet been fully explored. Moreover, the lack of automation in remote sensing task planning hinders the accessibility of remote sensing interpretation techniques, especially to non-remote sensing experts from multiple research fields. To this end, we present Remote Sensing ChatGPT, an LLM-powered agent that utilizes ChatGPT to connect various AI-based remote sensing models to solve complicated interpretation tasks. More specifically, given a user request and a remote sensing image, we utilized ChatGPT to understand user requests, perform task planning according to the tasks' functions, execute each subtask iteratively, and generate the final response according to the output of each subtask. Considering that LLM is trained with natural language and is not capable of directly perceiving visual concepts as contained in remote sensing images, we designed visual cues that inject visual information into ChatGPT. With Remote Sensing ChatGPT, users can simply send a remote sensing image with the corresponding request, and get the interpretation results as well as language feedback from Remote Sensing ChatGPT. Experiments and examples show that Remote Sensing ChatGPT can tackle a wide range of remote sensing tasks and can be extended to more tasks with more sophisticated models such as the remote sensing foundation model. The code and demo of Remote Sensing ChatGPT is publicly available at https://github.com/HaonanGuo/Remote-Sensing-ChatGPT .
翻译:近期,蓬勃发展的大语言模型(LLM),特别是ChatGPT,在语言理解、推理与交互方面展现出卓越性能,吸引了多领域的用户和研究人员。尽管大语言模型在自然语言和自然图像处理中已展现出类人任务完成能力,但其在遥感解译任务中的潜力尚未被充分探索。此外,遥感任务规划的自动化不足,阻碍了遥感解译技术向多研究领域非遥感专家的普及。为此,我们提出遥感ChatGPT——一种基于大语言模型的智能代理,通过ChatGPT连接多种基于AI的遥感模型以解决复杂解译任务。具体而言,给定用户请求和遥感图像后,我们利用ChatGPT理解用户需求,根据任务功能进行任务规划,迭代执行每个子任务,并根据各子任务输出生成最终响应。考虑到大语言模型基于自然语言训练,无法直接感知遥感图像中的视觉概念,我们设计了视觉提示以向ChatGPT注入视觉信息。借助遥感ChatGPT,用户只需发送遥感图像及其对应请求,即可获得解译结果和语言反馈。实验与示例表明,遥感ChatGPT能够处理广泛的遥感任务,并可扩展至更复杂模型(如遥感基础模型)支持的更多任务。遥感ChatGPT的代码与演示已公开于https://github.com/HaonanGuo/Remote-Sensing-ChatGPT 。