Retrieval and recommendation are two essential tasks in modern search tools. This paper introduces a novel retrieval-reranking framework leveraging Large Language Models (LLMs) to enhance the spatiotemporal and semantic associated mining and recommendation of relevant unusual climate and environmental events described in news articles and web posts. This framework uses advanced natural language processing techniques to address the limitations of traditional manual curation methods in terms of high labor cost and lack of scalability. Specifically, we explore an optimized solution to employ cutting-edge embedding models for semantically analyzing spatiotemporal events (news) and propose a Geo-Time Re-ranking (GT-R) strategy that integrates multi-faceted criteria including spatial proximity, temporal association, semantic similarity, and category-instructed similarity to rank and identify similar spatiotemporal events. We apply the proposed framework to a dataset of four thousand Local Environmental Observer (LEO) Network events, achieving top performance in recommending similar events among multiple cutting-edge dense retrieval models. The search and recommendation pipeline can be applied to a wide range of similar data search tasks dealing with geospatial and temporal data. We hope that by linking relevant events, we can better aid the general public to gain an enhanced understanding of climate change and its impact on different communities.
翻译:检索与推荐是现代搜索工具中的两个核心任务。本文提出了一种新颖的检索-重排序框架,该框架利用大语言模型(LLMs)来增强对新闻文章和网络帖子中描述的相关异常气候与环境事件的时空及语义关联挖掘与推荐。该框架采用先进的自然语言处理技术,以解决传统人工整理方法在人力成本高和可扩展性不足方面的局限性。具体而言,我们探索了一种优化方案,采用前沿的嵌入模型对时空事件(新闻)进行语义分析,并提出了一种地理-时间重排序(GT-R)策略。该策略整合了空间邻近性、时间关联性、语义相似性以及类别指导的相似性等多方面标准,用于对相似的时空事件进行排序和识别。我们将所提出的框架应用于包含四千个本地环境观察者(LEO)网络事件的数据集,在多个前沿的稠密检索模型中,实现了推荐相似事件的最佳性能。该搜索与推荐流程可广泛应用于处理地理空间和时间数据的类似数据搜索任务。我们希望通过关联相关事件,能够更好地帮助公众增进对气候变化及其对不同社区影响的理解。