Extracting structured knowledge from unstructured data still faces practical limitations: entity and event extraction pipelines remain brittle, knowledge graph construction requires costly ontology engineering, and cross-domain generalization is rarely production-ready. In contrast, space and time provide universal contextual anchors that naturally align heterogeneous information and benefit downstream tasks such as retrieval and reasoning. We introduce \textbf{STIndex}, an end-to-end system that structures unstructured content into a multidimensional spatiotemporal data warehouse. Users define domain-specific analysis dimensions with configurable hierarchies, while large language models perform context-aware extraction and grounding. \textbf{STIndex} integrates document-level memory, geocoding correction, and quality validation, and offers an interactive analytics dashboard for visualization, clustering, burst detection, and entity network analysis. In evaluation on a public health benchmark, \textbf{STIndex} improves spatiotemporal entity extraction F1 by 4.37\% (GPT-4o-mini) and 3.60\% (Qwen3-8B). A live demonstration and open-source code are available at https://stindex.ai4wa.com/dashboard.
翻译:从非结构化数据中提取结构化知识仍面临实际限制:实体与事件抽取流程脆弱,知识图谱构建需要昂贵的本体工程,且跨领域泛化很少能达到生产就绪。相比之下,时间和空间提供了天然的上下文锚点,它们能够统一异构信息,并惠及检索与推理等下游任务。我们提出 **STIndex**,一个端到端系统,可将非结构化内容组织为多维时空数据仓库。用户通过可配置层次结构定义领域特定的分析维度,同时大型语言模型执行上下文感知的抽取与接地。**STIndex** 集成了文档级记忆、地理编码纠错与质量验证,并提供交互式分析仪表盘,支持可视化、聚类、爆发检测及实体网络分析。在公共卫生基准测试的评估中,**STIndex** 分别将时空实体抽取F1值提升了4.37%(GPT-4o-mini)和3.60%(Qwen3-8B)。在线演示与开源代码可访问 https://stindex.ai4wa.com/dashboard。