In a world of increasing closed-source commercial machine learning models, model evaluations from developers must be taken at face value. These benchmark results, whether over task accuracy, bias evaluations, or safety checks, are traditionally impossible to verify by a model end-user without the costly or impossible process of re-performing the benchmark on black-box model outputs. This work presents a method of verifiable model evaluation using model inference through zkSNARKs. The resulting zero-knowledge computational proofs of model outputs over datasets can be packaged into verifiable evaluation attestations showing that models with fixed private weights achieve stated performance or fairness metrics over public inputs. These verifiable attestations can be performed on any standard neural network model with varying compute requirements. For the first time, we demonstrate this across a sample of real-world models and highlight key challenges and design solutions. This presents a new transparency paradigm in the verifiable evaluation of private models.
翻译:在封闭源代码商业机器学习模型日益普及的背景下,开发者提供的模型评估结果只能被当作表面数据接受。这些基准测试结果——无论是关于任务精度、偏见评估还是安全性检查——传统上模型终端用户无法验证,除非承担高昂成本甚至无法实现的对黑盒模型输出进行重新基准测试的过程。本研究提出一种利用基于zkSNARKs的模型推理实现可验证模型评估的方法。由此生成的关于数据集模型输出的零知识计算证明,可封装成可验证评估认证,证明具有固定私有权重的模型在公有输入下达到了声明的性能或公平性指标。这种可验证认证可应用于任何标准神经网络模型,且计算需求各异。我们首次在真实模型样本中对此进行验证,重点突出关键挑战与设计方案。这为私有模型的可验证评估领域开创了全新的透明度范式。