In a world of increasing closed-source commercial machine learning models, model evaluations from developers must be taken at face value. These benchmark results-whether over task accuracy, bias evaluations, or safety checks-are traditionally impossible to verify by a model end-user without the costly or impossible process of re-performing the benchmark on black-box model outputs. This work presents a method of verifiable model evaluation using model inference through zkSNARKs. The resulting zero-knowledge computational proofs of model outputs over datasets can be packaged into verifiable evaluation attestations showing that models with fixed private weights achieve stated performance or fairness metrics over public inputs. We present a flexible proving system that enables verifiable attestations to be performed on any standard neural network model with varying compute requirements. For the first time, we demonstrate this across a sample of real-world models and highlight key challenges and design solutions. This presents a new transparency paradigm in the verifiable evaluation of private models.
翻译:在商业机器学习模型日益闭源的世界中,开发者提供的模型评估结果往往只能被直接采信。无论是任务准确性、偏差评估还是安全性检查,这些基准测试结果对于模型终端用户而言,传统上无法在不对黑盒模型输出进行成本高昂或不可行的重复测试的情况下予以验证。本研究提出了一种通过zkSNARKs进行模型推理的可验证评估方法。该方法生成的关于数据集模型输出的零知识计算证明,可封装为可验证的评估证明,用以证实具有固定私有权重的模型在公开输入上达到了声明的性能或公平性指标。我们提出了一种灵活的证明系统,能够对具有不同计算需求的任何标准神经网络模型执行可验证证明。我们首次在真实场景模型样本中验证了该系统的可行性,并阐明了关键挑战与设计解决方案。这为私有模型的可验证评估提供了一种全新的透明度范式。