A rank-invariant clustering of variables is introduced that is based on the predictive strength between groups of variables, i.e., two groups are assigned a high similarity if the variables in the first group contain high predictive information about the behaviour of the variables in the other group and/or vice versa. The method presented here is model-free, dependence-based and does not require any distributional assumptions. Various general invariance and continuity properties are investigated, with special attention to those that are beneficial for the agglomerative hierarchical clustering procedure. A fully non-parametric estimator is considered whose excellent performance is demonstrated in several simulation studies and by means of real-data examples.
翻译:本文引入了一种基于变量组间预测强度的秩不变聚类方法,即若第一组变量包含关于另一组变量行为的高度预测信息(反之亦然),则判定这两组变量具有高相似性。所提出的方法无需模型假设、基于相关性且不要求任何分布假定。本文研究了多种广义不变性与连续性性质,特别关注那些对凝聚层次聚类过程有益的性质。我们考虑了一种完全非参数估计量,并通过多项模拟研究及实际数据示例验证了其卓越性能。