Unimodality constitutes a key property indicating grouping behavior of the data around a single mode of its density. We propose a method that partitions univariate data into unimodal subsets through recursive splitting around valley points of the data density. For valley point detection, we introduce properties of critical points on the convex hull of the empirical cumulative density function (ecdf) plot that provide indications on the existence of density valleys. Next, we apply a unimodal data modeling approach that provides a statistical model for each obtained unimodal subset in the form of a Uniform Mixture Model (UMM). Consequently, a hierarchical statistical model of the initial dataset is obtained in the form of a mixture of UMMs, named as the Unimodal Mixture Model (UDMM). The proposed method is non-parametric, hyperparameter-free, automatically estimates the number of unimodal subsets and provides accurate statistical models as indicated by experimental results on clustering and density estimation tasks.
翻译:单峰性是数据围绕其密度单一众数聚集行为的关键属性。我们提出一种方法,通过围绕数据密度谷点进行递归分割,将单变量数据划分为单峰子集。针对谷点检测,我们引入经验累积分布函数图凸包上临界点的性质,这些性质为密度谷的存在性提供了判据。随后,我们采用单峰数据建模方法,为每个获得的单峰子集建立以均匀混合模型形式呈现的统计模型。最终,初始数据集以UMM混合形式获得分层统计模型,称为单峰混合模型。该方法具有非参数、无超参数特性,能自动估计单峰子集数量,并在聚类与密度估计任务的实验结果表明其能提供精确的统计模型。