Quantization summarizes continuous distributions by calculating a discrete approximation. Among the widely adopted methods for data quantization is Lloyd's algorithm, which partitions the space into Vorono\"i cells, that can be seen as clusters, and constructs a discrete distribution based on their centroids and probabilistic masses. Lloyd's algorithm estimates the optimal centroids in a minimal expected distance sense, but this approach poses significant challenges in scenarios where data evaluation is costly, and relates to rare events. Then, the single cluster associated to no event takes the majority of the probability mass. In this context, a metamodel is required and adapted sampling methods are necessary to increase the precision of the computations on the rare clusters.
翻译:量化通过计算离散近似来概括连续分布。广泛应用于数据量化的方法之一是劳埃德算法,该算法将空间划分为可视为聚类的维诺细胞,并基于其质心和概率质量构建离散分布。劳埃德算法在最小期望距离意义上估计最优质心,但这种方法在数据评估成本高昂且涉及稀有事件的场景中构成显著挑战。此时,与无事件相关的单个聚类占据了大部分概率质量。在此背景下,需要构建元模型并采用自适应采样方法,以提升稀有聚类上计算结果的精度。