Text-conditioned generation models are commonly evaluated based on the quality of the generated data and its alignment with the input text prompt. On the other hand, several applications of prompt-based generative models require sufficient diversity in the generated data to ensure the models' capability of generating image and video samples possessing a variety of features. However, most existing diversity metrics are designed for unconditional generative models, and thus cannot distinguish the diversity arising from variations in text prompts and that contributed by the generative model itself. In this work, our goal is to quantify the prompt-induced and model-induced diversity in samples generated by prompt-based models. We propose an information-theoretic approach for internal diversity quantification, where we decompose the kernel-based entropy $H(X)$ of the generated data $X$ into the sum of the conditional entropy $H(X|T)$, given text variable $T$, and the mutual information $I(X; T)$ between the text and data variables. We introduce the \emph{Conditional-Vendi} score based on $H(X|T)$ to quantify the internal diversity of the model and the \emph{Information-Vendi} score based on $I(X; T)$ to measure the statistical relevance between the generated data and text prompts. We provide theoretical results to statistically interpret these scores and relate them to the unconditional Vendi score. We conduct several numerical experiments to show the correlation between the Conditional-Vendi score and the internal diversity of text-conditioned generative models. The codebase is available at \href{https://github.com/mjalali/conditional-vendi}{https://github.com/mjalali/conditional-vendi}.
翻译:基于文本条件的生成模型通常根据生成数据的质量及其与输入文本提示的匹配度进行评估。另一方面,基于提示的生成模型在多个应用场景中要求生成数据具有足够的多样性,以确保模型能够生成具备多种特征的图像和视频样本。然而,现有的大多数多样性度量指标是为无条件生成模型设计的,因此无法区分由文本提示变化引起的多样性与生成模型本身贡献的多样性。在本研究中,我们的目标是量化基于提示的模型所生成样本中由提示诱导和模型诱导的多样性。我们提出了一种用于内部多样性量化的信息论方法,将生成数据$X$基于核的熵$H(X)$分解为给定文本变量$T$的条件熵$H(X|T)$与文本变量和数据变量之间的互信息$I(X; T)$之和。我们引入基于$H(X|T)$的\emph{条件Vendi}分数来量化模型的内部多样性,并引入基于$I(X; T)$的\emph{信息Vendi}分数来度量生成数据与文本提示之间的统计相关性。我们提供了理论结果以统计解释这些分数,并将其与无条件Vendi分数建立关联。我们进行了多项数值实验,以展示条件Vendi分数与文本条件生成模型内部多样性之间的相关性。代码库可在\href{https://github.com/mjalali/conditional-vendi}{https://github.com/mjalali/conditional-vendi}获取。