The investigation of mixture models is a key to understand and visualize the distribution of multivariate data. Most mixture models approaches are based on likelihoods, and are not adapted to distribution with finite support or without a well-defined density function. This study proposes the Augmented Quantization method, which is a reformulation of the classical quantization problem but which uses the p-Wasserstein distance. This metric can be computed in very general distribution spaces, in particular with varying supports. The clustering interpretation of quantization is revisited in a more general framework. The performance of Augmented Quantization is first demonstrated through analytical toy problems. Subsequently, it is applied to a practical case study involving river flooding, wherein mixtures of Dirac and Uniform distributions are built in the input space, enabling the identification of the most influential variables.
翻译:混合模型的研究是理解和可视化多元数据分布的关键。大多数混合模型方法基于似然函数,不适用于有限支撑集或缺乏明确定义密度函数的分布。本研究提出增强量化方法,该方法是对经典量化问题的重新表述,但使用了p-瓦瑟斯坦距离。该度量可在非常一般的分布空间中计算,尤其适用于支撑集不同的情形。在更一般的框架下重新审视了量化的聚类解释。首先通过分析性玩具问题展示了增强量化的性能。随后将其应用于河流洪水的实际案例研究中,在输入空间中构建了狄拉克分布与均匀分布的混合模型,从而能够识别出最具影响力的变量。