This paper explores the interplay between statistics and generative artificial intelligence. Generative statistics, an integral part of the latter, aims to construct models that can {\it generate} efficiently and meaningfully new data across the whole of the (usually high dimensional) sample space, e.g. a new photo. Within it, the gradient-based approach is a current favourite that exploits effectively, for the above purpose, the information contained in the observed sample, e.g. an old photo. However, often there are missing data in the observed sample, e.g. missing bits in the old photo. To handle this situation, we have proposed a gradient-based algorithm for generative modelling. More importantly, our paper underpins rigorously this powerful approach by introducing a new F-entropy that is related to Fisher's divergence. (The F-entropy is also of independent interest.) The underpinning has enabled the gradient-based approach to expand its scope. For example, it can now provide a tool for generative model selection. Possible future projects include discrete data and Bayesian variational inference.
翻译:本文探讨了统计学与生成式人工智能之间的相互作用。作为后者的重要组成部分,生成统计学旨在构建能够高效且有意义地在整个(通常是高维的)样本空间中生成新数据(例如一张新照片)的模型。其中,基于梯度的方法是目前的主流方法,它能有效地利用观测样本(例如一张旧照片)中包含的信息来实现上述目标。然而,观测样本中常常存在缺失数据(例如旧照片中的缺失像素)。为处理这种情况,我们提出了一种用于生成建模的基于梯度的算法。更重要的是,本文通过引入一种与费希尔散度相关的新型F-熵,为此强大方法提供了严格的理论基础(该F-熵本身亦具有独立研究价值)。该理论基础使得基于梯度的方法得以拓展其应用范围,例如可为生成模型选择提供工具。未来可能的研究方向包括离散数据处理及贝叶斯变分推断。