The decoder-based machine learning generative algorithms such as Generative Adversarial Networks (GAN), Variational Auto-Encoders (VAE), Transformers show impressive results when constructing objects similar to those in a training ensemble. However, the generation of new objects builds mainly on the understanding of the hidden structure of the training dataset followed by a sampling from a multi-dimensional normal variable. In particular each sample is independent from the others and can repeatedly propose same type of objects. To cure this drawback we introduce a kernel-based measure quantization method that can produce new objects from a given target measure by approximating it as a whole and even staying away from elements already drawn from that distribution. This ensures a better diversity of the produced objects. The method is tested on classic machine learning benchmarks.
翻译:基于解码器的机器学习生成算法,如生成对抗网络(GAN)、变分自编码器(VAE)和Transformer,在构造与训练集相似的物体时展现出显著效果。然而,新物体的生成主要依赖于对训练数据集隐藏结构的理解,随后从多维正态变量中采样。特别是,每个样本之间相互独立,可能会重复生成同类物体。为解决这一缺陷,我们引入了一种基于核的测度量化方法,该方法通过整体近似目标测度来生成新物体,甚至能避免生成已从该分布中抽取的元素,从而确保生成物体具有更好的多样性。该方法在经典机器学习基准测试中得到了验证。