K-means clustering is widely used in psychological and psychometric research to identify profiles, subgroups, and potential typologies, yet its classical formulation does not test whether such groups exist as latent psychological categories. Instead, K-means partitions multidimensional space into regions around centroids, favoring compact, approximately spherical clusters defined by geometric distance. In this paper, we examine this limitation through a sequence of controlled simulated datasets. We then extend the analysis to the SMARVUS dataset, a large international psychometric dataset comprising survey responses from university students across 35 countries, to evaluate whether similar geometric partitioning patterns emerge in empirical psychological data. By contrasting simulated and empirical data, this paper argues that K-means can produce stable and visually coherent clustering solutions even in continuous Gaussian latent spaces without true subgroup structure.
翻译:K-means聚类被广泛应用于心理学与心理测量研究中,用于识别剖面图、子群体及潜在类型学,但其经典形式并未检验这些群体是否作为潜在的心理学类别存在。相反,K-means将多维空间划分为质心周围的区域,偏好由几何距离定义的紧凑、近似球形的聚类。本文通过一系列受控模拟数据集审视这一局限,随后将分析扩展至SMARVUS数据集——该大型国际心理测量数据集包含来自35个国家大学生的问卷调查响应——以评估类似几何分割模式是否在经验性心理数据中出现。通过对比模拟数据与经验数据,本文论证:即使在没有真实子群体结构的连续高斯潜在空间中,K-means也能产生稳定且视觉上连贯的聚类解。