In this paper, a decimal first degree cellular automata (FDCA) based clustering algorithm is proposed where clusters are created based on reachability. Cyclic spaces are created and configurations which are in the same cycle are treated as the same cluster. Here, real-life data objects are encoded into decimal strings using G\"odel number based encoding. The benefits of the scheme is, it reduces the encoded string length while maintaining the features properties. Candidate CA rules are identified based on some theoretical criteria such as self-replication and information flow. An iterative algorithm is developed to generate the desired number of clusters over three stages. The results of the clustering are evaluated based on benchmark clustering metrics such as Silhouette score, Davis Bouldin, Calinski Harabasz and Dunn Index. In comparison with the existing state-of-the-art clustering algorithms, our proposed algorithm gives better performance.
翻译:本文提出了一种基于十进制一维元胞自动机(FDCA)的聚类算法,该方法通过可达性来创建聚类。算法生成循环空间,并将位于同一循环中的构型视为同一聚类。在此过程中,现实世界的数据对象通过哥德尔数编码转换为十进制字符串。该方案的优势在于,它在保持特征属性的同时,能够缩减编码字符串长度。基于自复制和信息流等理论准则,我们筛选出了候选元胞自动机规则。随后,通过一个三阶段迭代算法,生成所需数量的聚类。聚类结果使用轮廓系数、戴维斯-布尔丁指数、卡林斯基-哈拉巴斯指数和邓恩指数等基准聚类指标进行评估。与现有最先进的聚类算法相比,我们提出的算法表现出更优的性能。