Modern urban spaces are equipped with an increasingly diverse set of sensors, all producing an abundance of multimodal data. Such multimodal data can be used to identify and reason about important incidents occurring in urban landscapes, such as major emergencies, cultural and social events, as well as natural disasters. However, such data may be fragmented over several sources and difficult to integrate due to the reliance on human-driven reasoning for identifying relationships between the multimodal data corresponding to an incident, as well as understanding the different components which define an incident. Such relationships and components are critical to identifying the causes of such incidents, as well as producing forecasting the scale and intensity of future incidents as they begin to develop. In this work, we create SIGMUS, a system for Semantic Integration for Knowledge Graphs in Multimodal Urban Spaces. SIGMUS uses Large Language Models (LLMs) to produce the necessary world knowledge for identifying relationships between incidents occurring in urban spaces and data from different modalities, allowing us to organize evidence and observations relevant to an incident without relying and human-encoded rules for relating multimodal sensory data with incidents. This organized knowledge is represented as a knowledge graph, organizing incidents, observations, and much more. We find that our system is able to produce reasonable connections between 5 different data sources (new article text, CCTV images, air quality, weather, and traffic measurements) and relevant incidents occurring at the same time and location.
翻译:现代城市空间配备了日益多样化的传感器,这些传感器均产生海量的多模态数据。此类多模态数据可用于识别和推理城市环境中发生的重要事件,例如重大突发事件、文化社会活动以及自然灾害。然而,由于依赖人工驱动的推理来识别与事件对应的多模态数据之间的关系,并理解定义事件的不同组成部分,此类数据可能分散于多个来源且难以整合。这些关系和组成部分对于识别此类事件的成因,以及预测事件开始发展时未来事件的规模和强度至关重要。在本工作中,我们创建了SIGMUS,一个用于多模态城市空间中知识图谱语义集成的系统。SIGMUS利用大型语言模型生成必要的世界知识,以识别城市空间中发生的事件与来自不同模态的数据之间的关系,使我们能够组织与事件相关的证据和观测结果,而无需依赖人工编码的规则来关联多模态传感数据与事件。这种组织化的知识以知识图谱的形式表示,对事件、观测结果等进行结构化组织。我们发现,我们的系统能够在5种不同数据源(新闻文本、监控图像、空气质量、天气和交通测量数据)与同时同地发生的相关事件之间建立合理的关联。