The Earth is a system of numerous interconnected spheres, such as the climate. Climate's global and regional influence requires understanding its evolution in space and time to improve knowledge and forecasts. Analyzing and studying decades of climate data is a data mining challenge. Cluster analysis minimizes data volumes and analyzes behavior by cluster. Understanding invariant behavior is as crucial as understanding variable behavior. Gridded data from two sources: Grided IMD data and CMIP5 HadCM3 decadal experiments, are studied using K-Means and MiSTIC clustering techniques to explore spatiotemporal clustering of maximum and minimum temperatures. The boundaries of k-means clustering correspond with topography. The Indian subcontinent's physiographic, climatic, and topographical characteristics affect MiSTIC's core areas. Both techniques yield overlapping clusters. The datasets' MiSTIC cluster counts varied significantly. The impact of data on this technique is shown in how the datasets group the Himalayas.
翻译:地球是一个由众多相互关联的圈层(如气候系统)构成的复杂系统。气候在全球及区域尺度上的影响力,要求我们深入理解其在时空维度上的演化规律,以提升预测能力与认知水平。对跨越数十年的气候数据进行分析与解读,是一项数据挖掘层面的挑战。聚类分析通过归纳集群行为,可有效压缩数据规模并解析其内在模式。理解不变行为与理解可变行为同等重要。本研究采用K-means与MiSTIC两种聚类技术,分别对来自两个数据源(网格化IMD数据与CMIP5 HadCM3年代际实验数据)的网格化最高、最低温度数据进行时空聚类探索。K-means聚类边界与地形特征具有对应关系。印度次大陆的地貌、气候及地形特征对MiSTIC的核心聚类区域产生影响。两种技术均产生重叠聚类。不同数据集中MiSTIC的聚类数量存在显著差异。数据集对喜马拉雅区域的聚类方式,揭示了数据特性对该技术的影响。