We present a novel clustering algorithm, visClust, that is based on lower dimensional data representations and visual interpretation. Thereto, we design a transformation that allows the data to be represented by a binary integer array enabling the further use of image processing methods to select a partition. Qualitative and quantitative analyses show that the algorithm obtains high accuracy (measured with an adjusted one-sided Rand-Index) and requires low runtime and RAM. We compare the results to 6 state-of-the-art algorithms, confirming the quality of visClust by outperforming in most experiments. Moreover, the algorithm asks for just one obligatory input parameter while allowing optimization via optional parameters. The code is made available on GitHub.
翻译:我们提出了一种新颖的聚类算法visClust,该算法基于低维数据表示与视觉解读。为此,我们设计了一种变换方法,使数据能够以二进制整数数组形式表示,从而可进一步利用图像处理方法选取数据划分。定性与定量分析表明,该算法具有高准确率(以调整后的单侧兰德指数衡量),且运行时和内存占用较低。我们将其与六种前沿算法进行对比,visClust在多数实验中表现更优,验证了其优越性。此外,该算法仅需一个强制性输入参数,同时可通过可选参数进行优化。相关代码已在GitHub上开源。