Development of Semantics-Based Distributed Middleware for Heterogeneous Data Integration and its Application for Drought

Drought is a complex environmental phenomenon that affects millions of people and communities all over the globe and is too elusive to be accurately predicted. This is mostly due to the scalability and variability of the web of environmental parameters that directly/indirectly causes the onset of different categories of drought. Since the dawn of man, efforts have been made to uniquely understand the natural indicators that provide signs of likely environmental events. These indicators/signs in the form of indigenous knowledge system have been used for generations. The intricate complexity of drought has, however, always been a major stumbling block for accurate drought prediction and forecasting systems. Recently, scientists in the field of agriculture and environmental monitoring have been discussing the integration of indigenous knowledge and scientific knowledge for a more accurate environmental forecasting system in order to incorporate diverse environmental information for a reliable drought forecast. Hence, in this research, the core objective is the development of a semantics-based data integration middleware that encompasses and integrates heterogeneous data models of local indigenous knowledge and sensor data towards an accurate drought forecasting system for the study areas. The local indigenous knowledge on drought gathered from the domain experts is transformed into rules to be used for performing deductive inference in conjunction with sensors data for determining the onset of drought through an automated inference generation module of the middleware. The semantic middleware incorporates, inter alia, a distributed architecture that consists of a streaming data processing engine based on Apache Kafka for real-time stream processing; a rule-based reasoning module; an ontology module for semantic representation of the knowledge bases.

翻译：干旱是一种复杂的环境现象，影响全球数百万人口和社区，且因其难以捉摸而无法准确预测。这主要是由于直接或间接导致不同类型干旱发生的环境参数网络的规模性和变异性。自人类文明诞生以来，人们一直致力于独特地理解那些能够预示可能环境事件的自然指标。这些以本土知识体系形式存在的指标/迹象已被世代相传。然而，干旱本身的复杂特性始终是精准干旱预测与预报系统面临的主要障碍。近年来，农业和环境监测领域的科学家们开始探讨将本土知识与科学知识相融合，以构建更准确的环境预报系统，从而整合多样化环境信息实现可靠的干旱预测。因此，本研究的核心目标是开发一种基于语义的数据集成中间件，该中间件能够整合并集成研究区域内本土知识与传感器数据的异构数据模型，从而构建精准的干旱预报系统。通过领域专家收集的关于干旱的本土知识被转化为推理规则，与传感器数据协同用于中间件自动推理生成模块，从而判定干旱的起始状态。该语义中间件融合了包括基于Apache Kafka的流式数据处理引擎（用于实时流处理）、规则推理模块以及本体模块（用于知识库的语义表示）在内的分布式架构。