We propose a distributional framework for assessing socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and mathematical finance to balance risk and utility when choosing between alternatives. Using this framework, we formally develop a risk-aware approach for foundation model selection given guardrails quantified by specified metrics. Inspired by portfolio optimization and selection theory in mathematical finance, we define a \emph{metrics portfolio} for each model as a means to aggregate a collection of metrics, and perform model selection based on the stochastic dominance of these portfolios. The statistical significance of our tests is backed theoretically by an asymptotic analysis via central limit theorems instantiated in practice via a bootstrap variance estimate. We use our framework to compare various large language models regarding risks related to drifting from instructions and outputting toxic content.
翻译:我们提出了一种分布性框架,用于在量化统计显著性的条件下评估基础模型的社会技术风险。该方法基于实随机变量一阶和二阶随机占优的新型统计相对检验。研究表明,该检验中的二阶统计量与计量经济学和数理金融中常用的均值-风险模型相关联——此类模型旨在选择备选方案时平衡风险与效用。基于此框架,我们正式开发了一种风险感知方法,用于在给定指标量化的护栏下选择基础模型。受数理金融中的投资组合优化与选择理论启发,我们为每个模型定义了一个"指标投资组合"以聚合多指标集合,并依据这些投资组合的随机占优性进行模型选择。检验的统计显著性通过中心极限定理的渐近分析在理论上得到支撑,实践中借助自助法方差估计实现。我们利用该框架比较了多种大语言模型在指令漂移与有害内容输出方面的风险。