Access to modern generative systems is often restricted to querying an API (the ``black-box" setting) and many properties of the system are unknown to the user at inference time. While recent work has shown that low-dimensional representations of models based on the relationship between their embedded responses to a set of queries are useful for inferring model-level properties, the quality of these representations is highly sensitive to the query set. We introduce the \emph{discriminative factorization} to distinguish between high- and low-quality query sets in the context of black-box model-level classification. Under this framework, the probability of chance-level classification decays exponentially in the query budget. On three auditing tasks, estimated factorization parameters predict the empirical performance decay rate. We conclude by showing that query sets selected using the estimated discriminative field reproduce the empirical ordering of oracle query sets.
翻译:现代生成系统的访问通常仅限于通过API进行查询(即“黑盒”设置),且推理时系统的许多属性对用户而言是未知的。尽管近期研究表明,基于模型对一组查询的嵌入响应之间关系所构建的低维表示,有助于推断模型级别的属性,但这些表示的质量对查询集高度敏感。我们引入**判别因子分解**,用于在黑盒模型级分类场景中区分高质量与低质量的查询集。在该框架下,随机分类概率随查询预算呈指数衰减。在三个审计任务中,估计的因子分解参数能够预测经验性能的衰减速率。最后,我们证明,利用估计的判别场选择的查询集可复现真实查询集的实证排序。