Understanding the behavior of black-box large language models and determining effective means of comparing their performance is a key task in modern machine learning. We consider how large language models respond to a specific query by analyzing how the distributions of responses vary over different values of tuning parameters. We frame this problem in a general mathematical setting, treating the mapping from model parameters to response distributions as a structured family of probability measures, endowed with a geometry via a dissimilarity measure. We show how dissimilarities between response distributions can be represented in low-dimensional Euclidean space through a joint Euclidean mirror surface encoding the underlying geometry, which permits both qualitative and quantitative analysis of large language models and provides insight into predicting response distributions for different values of tuning parameters. We propose an estimation procedure for the underlying joint Euclidean mirror based on observed samples from the response distributions, and we prove its asymptotic properties. Additionally, we propose a statistically consistent procedure to infer the value of an unknown model parameter based on samples from the corresponding response distribution and the estimated joint Euclidean mirror. In an experimental setting with large language models, we find that changes in different tuning parameter values correspond to distinct directions in the embedding space, making it possible to estimate the tuning parameters that were used to generate a given response.
翻译:理解黑箱大语言模型的行为并确定比较其性能的有效手段,是现代机器学习中的关键任务。我们通过分析响应分布随不同调参值的变化,研究大语言模型如何响应特定查询。我们将其置于一般数学框架中,将模型参数到响应分布的映射视为一个结构化的概率测度族,并通过差异度量赋予其几何结构。我们展示了如何通过联合欧几里得镜面在低维欧几里得空间中表示响应分布之间的差异,该镜面编码了底层几何结构,从而实现对大型语言模型的定性与定量分析,并为预测不同调参值下的响应分布提供洞见。我们基于响应分布的观测样本,提出了一种估计底层联合欧几里得镜面的方法,并证明了其渐近性质。此外,我们提出了一种统计一致的方法,基于对应响应分布的样本及估计的联合欧几里得镜面,推断未知模型参数的值。在与大语言模型的实验设置中,我们发现不同调参值的改变对应于嵌入空间中的不同方向,从而能够估计生成特定响应所使用的调参值。