A relational dataset is often analyzed by optimally assigning a label to each element through clustering or ordering. While similar characterizations of a dataset would be achieved by both clustering and ordering methods, the former has been studied much more actively than the latter, particularly for the data represented as graphs. This study fills this gap by investigating methodological relationships between several clustering and ordering methods, focusing on spectral techniques. Furthermore, we evaluate the resulting performance of the clustering and ordering methods. To this end, we propose a measure called the label continuity error, which generically quantifies the degree of consistency between a sequence and partition for a set of elements. Based on synthetic and real-world datasets, we evaluate the extents to which an ordering method identifies a module structure and a clustering method identifies a banded structure.
翻译:关系型数据集通常通过聚类或排序为每个元素分配最优标签来进行分析。尽管聚类和排序方法都能对数据集实现相似的刻画,但前者的研究活跃度远高于后者,尤其针对图结构数据而言。本研究通过聚焦谱技术,探究若干聚类与排序方法之间的方法论关联,填补这一研究空白。进一步地,我们评估了聚类与排序方法所产生的实际效果。为此,我们提出一种称为标签连续性误差的度量指标,该指标可泛化地量化元素集合中序列与划分之间的一致性程度。基于合成数据集与真实世界数据集,我们评估了排序方法识别模块结构的能力以及聚类方法识别带状结构的能力。