Over the last decade, the Dip-test of unimodality has gained increasing interest in the data mining community as it is a parameter-free statistical test that reliably rates the modality in one-dimensional samples. It returns a so called Dip-value and a corresponding probability for the sample's unimodality (Dip-p-value). These two values share a sigmoidal relationship. However, the specific transformation is dependent on the sample size. Many Dip-based clustering algorithms use bootstrapped look-up tables translating Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a specifically designed sigmoid function as a substitute for these state-of-the-art look-up tables. This accelerates computation and provides an approximation of the Dip- to Dip-p-value transformation for every single sample size. Further, it is differentiable and can therefore easily be integrated in learning schemes using gradient descent. We showcase this by exploiting our function in a novel subspace clustering algorithm called Dip'n'Sub. We highlight in extensive experiments the various benefits of our proposal.
翻译:过去十年中,单峰性Dip-test作为无参数统计检验方法,因其能可靠评估一维样本的模态特性而在数据挖掘领域受到日益关注。该方法返回所谓的Dip值及对应的样本单峰性概率(Dip-p值)。这两个值之间存在S型函数关系,但具体转换形式取决于样本规模。许多基于Dip的聚类算法通过自助法生成的查找表,将Dip值转换为Dip-p值,但仅适用于有限数量的样本规模。我们提出一种专门设计的S型函数作为当前最优查找表的替代方案,该方案不仅加速计算过程,还能为任意样本规模提供Dip值到Dip-p值的转换近似。此外,该函数具有可微特性,可轻松集成到基于梯度下降的学习框架中。我们通过新型子空间聚类算法Dip'n'Sub展示了该函数的应用价值,并在大量实验中凸显了本方案的多种优势。