We propose a distributional framework for assessing socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and mathematical finance to balance risk and utility when choosing between alternatives. Using this framework, we formally develop a risk-aware approach for foundation model selection given guardrails quantified by specified metrics. Inspired by portfolio optimization and selection theory in mathematical finance, we define a metrics portfolio for each model as a means to aggregate a collection of metrics, and perform model selection based on the stochastic dominance of these portfolios. The statistical significance of our tests is backed theoretically by an asymptotic analysis via central limit theorems instantiated in practice via a bootstrap variance estimate. We use our framework to compare various large language models regarding risks related to drifting from instructions and outputting toxic content.
翻译:我们提出一个用于评估基础模型社会技术风险的分布框架,该框架具备量化的统计显著性。我们的方法基于一种新的统计相对检验,该检验以实随机变量的一阶和二阶随机占优为基础。我们证明,该检验中的二阶统计量与计量经济学和数理金融中常用于在选择方案时平衡风险与效用的均值-风险模型相关联。利用此框架,我们正式开发了一种风险感知方法,用于在给定特定指标量化的护栏约束下进行基础模型选择。受数理金融中投资组合优化与选择理论的启发,我们为每个模型定义一个指标组合,作为聚合一组指标的手段,并基于这些组合的随机占优进行模型选择。我们检验的统计显著性在理论上通过中心极限定理的渐近分析得到支持,并在实践中通过自助法方差估计实现。我们使用该框架比较了多种大语言模型在偏离指令和生成有毒内容方面的风险。