Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data. The primary objective of density estimation is to estimate the probability density function of a random variable. This process is particularly valuable when dealing with univariate or multivariate data and is essential for tasks such as clustering, anomaly detection, and generative modeling. In this paper we propose the mono-variate approximation of the density using spline quasi interpolation and we applied it in the context of clustering modeling. The clustering technique used is based on the construction of suitable multivariate distributions which rely on the estimation of the monovariate empirical densities (marginals). Such an approximation is achieved by using the proposed spline quasi-interpolation, while the joint distributions to model the sought clustering partition is constructed with the use of copulas functions. In particular, since copulas can capture the dependence between the features of the data independently from the marginal distributions, a finite mixture copula model is proposed. The presented algorithm is validated on artificial and real datasets.
翻译:密度估计是一种基础技术,广泛应用于各个领域,用于建模和理解数据的潜在分布。密度估计的主要目标是估计随机变量的概率密度函数。这一过程在处理单变量或多变量数据时尤为重要,并且对于聚类、异常检测和生成建模等任务至关重要。本文提出了一种基于样条拟插值的单变量密度逼近方法,并将其应用于聚类建模中。所使用的聚类技术依赖于构建合适的多元分布,而这些分布又依赖于对单变量经验密度(边际分布)的估计。该逼近通过所提出的样条拟插值实现,而用于建模所需聚类划分的联合分布则借助Copula函数构建。特别地,由于Copula能够独立于边际分布捕捉数据特征之间的依赖关系,本文提出了一种有限混合Copula模型。所提出的算法在人工数据集和真实数据集上进行了验证。