The investigation of mixture models is a key to understand and visualize the distribution of multivariate data. Most mixture models approaches are based on likelihoods, and are not adapted to distribution with finite support or without a well-defined density function. This study proposes the Augmented Quantization method, which is a reformulation of the classical quantization problem but which uses the p-Wasserstein distance. This metric can be computed in very general distribution spaces, in particular with varying supports. The clustering interpretation of quantization is revisited in a more general framework. The performance of Augmented Quantization is first demonstrated through analytical toy problems. Subsequently, it is applied to a practical case study involving river flooding, wherein mixtures of Dirac and Uniform distributions are built in the input space, enabling the identification of the most influential variables.
翻译:混合模型的研究是理解和可视化多元数据分布的关键。大多数混合模型方法基于似然函数,且不适用于有限支撑集或无明确定义密度函数的分布。本研究提出了增强量化方法,这是经典量化问题的重新表述,但使用 p-Wasserstein 距离。该度量可在非常一般的分布空间中计算,尤其适用于具有不同支撑集的情况。量化方法的聚类解释在更通用的框架下被重新审视。首先通过分析性玩具问题展示了增强量化的性能。随后,将其应用于涉及河流洪水的实际案例研究,在输入空间中构建了狄拉克分布与均匀分布的混合模型,从而能够识别最具影响力的变量。