Timely detection of critical health conditions remains a major challenge in public health analytics, especially in Big Data environments characterized by high volume, rapid velocity, and diverse variety of clinical data. This study presents an ontology-enabled real-time analytics framework that integrates Complex Event Processing (CEP) and Large Language Models (LLMs) to enable intelligent health event detection and semantic reasoning over heterogeneous, high-velocity health data streams. The architecture leverages the Basic Formal Ontology (BFO) and Semantic Web Rule Language (SWRL) to model diagnostic rules and domain knowledge. Patient data is ingested and processed using Apache Kafka and Spark Streaming, where CEP engines detect clinically significant event patterns. LLMs support adaptive reasoning, event interpretation, and ontology refinement. Clinical information is semantically structured as Resource Description Framework (RDF) triples in Graph DB, enabling SPARQL-based querying and knowledge-driven decision support. The framework is evaluated using a dataset of 1,000 Tuberculosis (TB) patients as a use case, demonstrating low-latency event detection, scalable reasoning, and high model performance (in terms of precision, recall, and F1-score). These results validate the system's potential for generalizable, real-time health analytics in complex Big Data scenarios.
翻译:在公共卫生分析中,及时检测关键健康状况仍然是一项重大挑战,尤其是在以临床数据体量大、速度快、多样性高为特征的大数据环境中。本研究提出了一种基于本体的实时分析框架,该框架集成了复杂事件处理(CEP)和大型语言模型(LLM),旨在实现对异构、高速健康数据流的智能健康事件检测与语义推理。该架构利用基础形式本体(BFO)和语义网规则语言(SWRL)对诊断规则和领域知识进行建模。患者数据通过Apache Kafka和Spark Streaming进行摄取和处理,CEP引擎在此检测具有临床意义的事件模式。LLM则支持自适应推理、事件解释和本体优化。临床信息以资源描述框架(RDF)三元组的形式在图形数据库中语义化地结构化,从而支持基于SPARQL的查询和知识驱动的决策支持。该框架以包含1,000名结核病(TB)患者的数据集作为用例进行评估,展示了低延迟的事件检测、可扩展的推理能力以及较高的模型性能(在精确率、召回率和F1分数方面)。这些结果验证了该系统在复杂大数据场景中实现可推广的实时健康分析的潜力。