This paper explores the interplay between statistics and generative artificial intelligence. Generative statistics, an integral part of the latter, aims to construct models that can {\it generate} efficiently and meaningfully new data across the whole of the (usually high dimensional) sample space, e.g. a new photo. Within it, the gradient-based approach is a current favourite that exploits effectively, for the above purpose, the information contained in the observed sample, e.g. an old photo. However, often there are missing data in the observed sample, e.g. missing bits in the old photo. To handle this situation, we have proposed a gradient-based algorithm for generative modelling. More importantly, our paper underpins rigorously this powerful approach by introducing a new F-entropy that is related to Fisher's divergence. (The F-entropy is also of independent interest.) The underpinning has enabled the gradient-based approach to expand its scope. For example, it can now provide a tool for generative model selection. Possible future projects include discrete data and Bayesian variational inference.
翻译:本文探讨了统计学与生成式人工智能之间的相互作用。作为后者的重要组成部分,生成统计学的目标在于构建能够高效且有意义地在整个(通常为高维)样本空间中生成新数据(例如一张新照片)的模型。其中,梯度方法因其能有效利用观测样本(例如一张旧照片)中所含信息以实现上述目标,而成为当前主流方法。然而,观测样本中常存在缺失数据(例如旧照片中的缺失像素)。为处理这种情况,我们提出了一种用于生成建模的梯度算法。更重要的是,本文通过引入与费希尔散度相关的新型F-熵,为该强大方法奠定了严格的理论基础(F-熵本身亦具有独立研究价值)。该理论基础使得梯度方法得以拓展其应用范围,例如可为生成模型选择提供工具。未来可能的研究方向包括离散数据处理与贝叶斯变分推断。