We present a novel clustering algorithm, visClust, that is based on lower dimensional data representations and visual interpretation. Thereto, we design a transformation that allows the data to be represented by a binary integer array enabling the use of image processing methods to select a partition. Qualitative and quantitative analyses measured in accuracy and an adjusted Rand-Index show that the algorithm performs well while requiring low runtime and RAM. We compare the results to 6 state-of-the-art algorithms with available code, confirming the quality of visClust by superior performance in most experiments. Moreover, the algorithm asks for just one obligatory input parameter while allowing optimization via optional parameters. The code is made available on GitHub and straightforward to use.
翻译:本文提出了一种新颖的聚类算法visClust,该算法基于低维数据表示与可视化解释。为此,我们设计了一种变换方法,使数据能够以二进制整数数组的形式表示,从而能够利用图像处理方法进行分区选择。基于准确率和调整兰德指数(adjusted Rand-Index)的定性与定量分析表明,该算法在保持低运行时间和低内存占用的同时表现出优异性能。我们将结果与六种有公开代码的当前最优算法进行对比,验证了visClust在大多数实验中因更优性能而具有的卓越性。此外,该算法仅要求一个必选输入参数,同时允许通过可选参数进行优化。代码已在GitHub上公开发布且易于使用。