Prompt-guided generative AI models have rapidly expanded across vision and language domains, producing realistic and diverse outputs from textual inputs. The growing variety of such models, trained with different data and architectures, calls for principled methods to identify which types of prompts lead to distinct model behaviors. In this work, we propose PromptSplit, a kernel-based framework for detecting and analyzing prompt-dependent disagreement between generative models. For each compared model pair, PromptSplit constructs a joint prompt--output representation by forming tensor-product embeddings of the prompt and image (or text) features, and then computes the corresponding kernel covariance matrix. We utilize the eigenspace of the weighted difference between these matrices to identify the main directions of behavioral difference across prompts. To ensure scalability, we employ a random-projection approximation that reduces computational complexity to $O(nr^2 + r^3)$ for projection dimension $r$. We further provide a theoretical analysis showing that this approximation yields an eigenstructure estimate whose expected deviation from the full-dimensional result is bounded by $O(1/r^2)$. Experiments across text-to-image, text-to-text, and image-captioning settings demonstrate that PromptSplit accurately detects ground-truth behavioral differences and isolates the prompts responsible, offering an interpretable tool for detecting where generative models disagree.
翻译:基于提示的生成式人工智能模型已在视觉和语言领域迅速扩展,能够根据文本输入生成逼真且多样化的输出。随着此类模型在训练数据和架构上的差异日益增多,亟需一种原则性方法来识别哪些类型的提示会导致不同的模型行为。本文提出PromptSplit,一种基于核函数的框架,用于检测和分析生成模型之间依赖于提示的分歧。对于每一对比较的模型,PromptSplit通过构建提示与图像(或文本)特征的张量积嵌入来构造联合提示-输出表示,进而计算相应的核协方差矩阵。我们利用这些矩阵加权差值的特征空间来识别跨提示行为差异的主要方向。为确保可扩展性,我们采用随机投影近似方法,将计算复杂度降低至$O(nr^2 + r^3)$(其中$r$为投影维度)。我们进一步提供了理论分析,证明该近似得到的特征结构估计值与全维度结果的期望偏差以$O(1/r^2)$为界。在文本到图像、文本到文本以及图像描述生成场景下的实验表明,PromptSplit能准确检测真实行为差异并定位导致分歧的提示,为检测生成模型的分歧点提供了可解释的工具。