Data cohesion, a recently introduced measure inspired by social interactions, uses distance comparisons to assess relative proximity. In this work, we provide a collection of results which can guide the development of cohesion-based methods in exploratory data analysis and human-aided computation. Here, we observe the important role of highly clustered "point-like" sets and the ways in which cohesion allows such sets to take on qualities of a single weighted point. In doing so, we see how cohesion complements metric-adjacent measures of dissimilarity and responds to local density. We conclude by proving that cohesion is the unique function with (i) average value equal to one-half and (ii) the property that the influence of an outlier is proportional to its mass. Properties of cohesion are illustrated with examples throughout.
翻译:数据凝聚力是一项近期受社会互动启发而提出的度量指标,它通过距离比较来评估相对邻近性。本文提供了一系列结果,可指导基于凝聚力的方法在探索性数据分析和人机协同计算中的发展。我们观察到高度聚集的"点状"集合的关键作用,以及凝聚力如何使这类集合具备单个加权点的特性。在此过程中,我们揭示了凝聚力如何补充度量邻近性的差异度指标,并对局部密度做出响应。最后,我们证明了凝聚力是唯一同时满足以下两个条件的函数:(i)平均值等于二分之一;(ii)离群点的影响与其质量成正比。全文通过实例贯穿阐述了凝聚力的性质。