Machine learning property attestations allow provers (e.g., model providers or owners) to attest properties of their models/datasets to verifiers (e.g., regulators, customers), enabling accountability towards regulations and policies. But, current approaches do not support generative models or large datasets. We present PAL*M, a property attestation framework for large generative models, illustrated using large language models. PAL*M defines properties across training and inference, leverages confidential virtual machines with security-aware GPUs for coverage of CPU-GPU operations, and proposes using incremental multiset hashing over memory-mapped datasets to efficiently track their integrity. We implement PAL*M on Intel TDX and NVIDIA H100, showing it is efficient, scalable, versatile, and secure.
翻译:机器学习属性证明允许证明方(例如模型提供者或所有者)向验证方(例如监管机构、客户)证明其模型/数据集的属性,从而实现对法规和政策的问责。然而,现有方法不支持生成模型或大型数据集。我们提出了PAL*M,一个面向大型生成模型的属性证明框架,并以大语言模型为例进行说明。PAL*M定义了贯穿训练和推理阶段的属性,利用配备安全感知GPU的机密虚拟机以覆盖CPU-GPU操作,并提出通过对内存映射数据集进行增量多重集哈希来高效追踪其完整性。我们在Intel TDX和NVIDIA H100平台上实现了PAL*M,证明其具有高效、可扩展、通用且安全的特点。