A Hybrid Chimp Optimization Algorithm and Generalized Normal Distribution Algorithm with Opposition-Based Learning Strategy for Solving Data Clustering Problems

2023 年 2 月 16 日

翻译：基于对立学习策略的混合黑猩猩优化算法与广义正态分布算法求解数据聚类问题

Sayed Pedram Haeri Boroujeni,Elnaz Pashaei

from arxiv, 48 pages, 14 Tables, 12 Figures

This paper is concerned with data clustering to separate clusters based on the connectivity principle for categorizing similar and dissimilar data into different groups. Although classical clustering algorithms such as K-means are efficient techniques, they often trap in local optima and have a slow convergence rate in solving high-dimensional problems. To address these issues, many successful meta-heuristic optimization algorithms and intelligence-based methods have been introduced to attain the optimal solution in a reasonable time. They are designed to escape from a local optimum problem by allowing flexible movements or random behaviors. In this study, we attempt to conceptualize a powerful approach using the three main components: Chimp Optimization Algorithm (ChOA), Generalized Normal Distribution Algorithm (GNDA), and Opposition-Based Learning (OBL) method. Firstly, two versions of ChOA with two different independent groups' strategies and seven chaotic maps, entitled ChOA(I) and ChOA(II), are presented to achieve the best possible result for data clustering purposes. Secondly, a novel combination of ChOA and GNDA algorithms with the OBL strategy is devised to solve the major shortcomings of the original algorithms. Lastly, the proposed ChOAGNDA method is a Selective Opposition (SO) algorithm based on ChOA and GNDA, which can be used to tackle large and complex real-world optimization problems, particularly data clustering applications. The results are evaluated against seven popular meta-heuristic optimization algorithms and eight recent state-of-the-art clustering techniques. Experimental results illustrate that the proposed work significantly outperforms other existing methods in terms of the achievement in minimizing the Sum of Intra-Cluster Distances (SICD), obtaining the lowest Error Rate (ER), accelerating the convergence speed, and finding the optimal cluster centers.

翻译：本文关注基于连通性原则分离聚类以将相似与不相似数据分类到不同组群的数据聚类问题。尽管K-means等经典聚类算法是高效技术，但在求解高维问题时常陷入局部最优且收敛速度缓慢。为解决这些问题，研究者引入了众多成功的元启发式优化算法及智能方法，通过允许灵活移动或随机行为来规避局部最优问题。本研究尝试利用三个核心组件构建强效方法：黑猩猩优化算法（ChOA）、广义正态分布算法（GNDA）及对立学习（OBL）策略。首先，提出具有两种不同独立分组策略及七种混沌映射的ChOA版本（ChOA(I)与ChOA(II)），以实现数据聚类目标的最优结果。其次，针对原始算法的主要缺陷，设计ChOA与GNDA算法结合OBL策略的新型混合方法。最后，所提出的ChOAGNDA方法是一种基于ChOA与GNDA的选择性对立（SO）算法，可用于求解大规模复杂现实优化问题，特别是数据聚类应用。通过与七种主流元启发式优化算法及八种最新聚类技术进行对比评估，实验结果表明：所提方法在最小化类内距离和（SICD）、获取最低错误率（ER）、加速收敛速度及寻找最优聚类中心方面显著优于现有方法。